Hi Everyone
I have two datasets which I aim to merge.
First dataset looks like this (some of it):
"HomeTeam","AwayTeam","Date","B365H","B365D","B365A","p_home_win","p_draw","p_away_win" "Man City","Blackpool","01/01/2011","1.29","5.5","10","56.170","24.346","19.484" "Stoke","Everton","01/01/2011","2.63","3.2","2.75","21.027","27.827","51.146" "Sunderland","Blackburn","01/01/2011","1.83","3.5","4.5","60.521","24.321","15.158" "Fulham","West Brom","01/04/2011","2.2","3.25","3.4","24.902","23.708","51.390" "Aston Villa","Sunderland","01/05/2011","2.2","3.25","3.4","69.424","16.212","14.353" "Newcastle","West Ham","01/05/2011","1.83","3.6","4.33","58.885","18.826","22.284" "Wolves","Chelsea","01/05/2011","8","4.2","1.44","66.943","16.625","16.419" "Chelsea","Blackburn","01/15/2011","1.29","5.5","11","89.759","6.747","3.281" "Man City","Wolves","01/15/2011","1.33","5","10","45.785","27.370","26.845" "Stoke","Bolton","01/15/2011","2.1","3.25","3.75","56.967","24.800","18.233" "West Brom","Blackpool","01/15/2011","1.8","3.6","4.5","59.517","18.727","21.751" "West Ham","Arsenal","01/15/2011","6","4","1.57","6.850","15.515","77.632" "Wigan","Fulham","01/15/2011","2.5","3.25","2.88","20.793","27.330","51.877" "Birmingham","Aston Villa","01/16/2011","2.4","3.2","3.1","23.311","29.941","46.747" "Liverpool","Everton","01/16/2011","2.2","3.2","3.5","64.537","23.468","11.994" "Sunderland","Newcastle","01/16/2011","2.2","3.25","3.4","44.228","33.868","21.903" "Arsenal","Wigan","01/22/2011","1.2","6.5","15","49.116","23.552","27.332" "Everton","West Ham","01/22/2011","1.5","4","7.5","52.747","25.312","21.941" "Fulham","Stoke","01/22/2011","2.2","3.2","3.5","71.301","20.783","7.916" "Man United","Birmingham","01/22/2011","1.2","6.5","15","63.496","23.443","13.062" "Newcastle","Tottenham","01/22/2011","3.5","3.4","2.1","33.698","26.365","39.936" "Wolves","Liverpool","01/22/2011","3.4","3.2","2.25","9.437","15.631","74.927" "Blackburn","West Brom","01/23/2011","2.25","3.3","3.25","53.162","29.265","17.574" "Bolton","Chelsea","01/24/2011","6","3.75","1.62","29.028","23.656","47.317" "Wigan","Aston Villa","01/25/2011","3.2","3.25","2.3","53.062","23.313","23.625" "Liverpool","Fulham","01/26/2011","1.53","3.8","7","73.975","16.967","9.056" "Arsenal","Everton","02/01/2011","1.4","4.5","8.5","51.904","21.570","26.525" "Man United","Aston Villa","02/01/2011","1.33","5","9.5","67.411","23.158","9.431" "Sunderland","Chelsea","02/01/2011","5.5","3.6","1.67","19.029","33.295","47.675"
Second dataset looks like this (some of it):
"HomeTeam","AwayTeam","Date","B365H","B365D","B365A","HomeTeam_hotness_beforegame","AwayTeam_hotness_beforegame" "Arsenal","Chelsea","12/27/2010","2.63","3.2","2.75","3.7604757667","-3.760475767" "Birmingham","Man_United","12/28/2010","6.5","4.33","1.5","-0.63663219","0.6366321898" "Man_City","Aston_Villa","12/28/2010","1.44","4.2","8","1.6836796735","-1.683679673" "Stoke","Fulham","12/28/2010","1.83","3.4","4.75","2.9566006505","-2.956600651" "Sunderland","Blackpool","12/28/2010","1.62","3.75","6","-0.723181362","0.7231813625" "Tottenham","Newcastle","12/28/2010","1.53","4","6.5","2.2847551983","-2.284755198" "West_Brom","Blackburn","12/28/2010","1.83","3.5","4.5","0.382658028","-0.382658028" "West_Ham","Everton","12/28/2010","3.4","3.2","2.25","1.7574369645","-1.757436964" "Chelsea","Bolton","12/29/2010","1.29","5.5","11","-3.928073866","3.9280738659" "Liverpool","Wolves","12/29/2010","1.33","5","10","-0.822763551","0.8227635506" "Wigan","Arsenal","12/29/2010","6.5","4.2","1.5","-0.538853648","0.5388536475" "Birmingham","Arsenal","01/01/2011","5.5","3.6","1.67","-0.181423191","0.1814231907" "Liverpool","Bolton","01/01/2011","1.7","3.75","5","-1.100856462","1.1008564622" "Man_City","Blackpool","01/01/2011","1.29","5.5","10","-1.022399727","1.0223997272" "Stoke","Everton","01/01/2011","2.63","3.2","2.75","1.0295768053","-1.029576805" "Sunderland","Blackburn","01/01/2011","1.83","3.5","4.5","-0.249932422","0.249932422" "Tottenham","Fulham","01/01/2011","1.4","4.5","8.5","2.4818040774","-2.481804077" "West_Brom","Man_United","01/01/2011","6","4.2","1.53","-0.026691257","0.026691257" "West_Ham","Wolves","01/01/2011","2","3.4","3.8","-0.774912117","0.7749121174" "Chelsea","Aston_Villa","01/02/2011","1.25","5.5","13","-1.730987424","1.7309874237" "Wigan","Newcastle","01/02/2011","2.5","3.2","2.9","0.264612696","-0.264612696" "Blackpool","Birmingham","01/04/2011","2.3","3.25","3.25","2.1186630801","-2.11866308" "Fulham","West_Brom","01/04/2011","2.2","3.25","3.4","-1.037093869","1.0370938694" "Man_United","Stoke","01/04/2011","1.25","5.75","12","1.3149114291","-1.314911429" "Arsenal","Man_City","01/05/2011","1.91","3.6","4","-0.441275477","0.4412754771" "Aston_Villa","Sunderland","01/05/2011","2.2","3.25","3.4","-0.898485414","0.8984854145" "Blackburn","Liverpool","01/05/2011","3.4","3.25","2.2","0.0970333453","-0.097033345" "Bolton","Wigan","01/05/2011","1.73","3.6","5","0.8621463159","-0.862146316" "Everton","Tottenham","01/05/2011","2.63","3.25","2.75","-2.335397536","2.3353975359" "Newcastle","West_Ham","01/05/2011","1.83","3.6","4.33","0.1260786995","-0.1260787" "Wolves","Chelsea","01/05/2011","8","4.2","1.44","3.7990423986","-3.799042399" "Blackpool","Liverpool","01/12/2011","3.8","3.75","1.91","2.705250589","-2.705250589" "Chelsea","Blackburn","01/15/2011","1.29","5.5","11","-2.77646534","2.7764653398" "Man_City","Wolves","01/15/2011","1.33","5","10","-0.716695062","0.7166950617" "Stoke","Bolton","01/15/2011","2.1","3.25","3.75","-0.184180716","0.1841807157" "West_Brom","Blackpool","01/15/2011","1.8","3.6","4.5","-2.662987004","2.6629870041" "West_Ham","Arsenal","01/15/2011","6","4","1.57","0.4998071633","-0.499807163" "Wigan","Fulham","01/15/2011","2.5","3.25","2.88","-0.658881604","0.6588816037" "Birmingham","Aston_Villa","01/16/2011","2.4","3.2","3.1","0.0981564779","-0.098156478" "Liverpool","Everton","01/16/2011","2.2","3.2","3.5","-1.220109433","1.220109433" "Sunderland","Newcastle","01/16/2011","2.2","3.25","3.4","-0.381697342","0.3816973421" "Tottenham","Man_United","01/16/2011","2.88","3.3","2.5","-0.808198231","0.808198231" "Arsenal","Wigan","01/22/2011","1.2","6.5","15","0.4748149132","-0.474814913" "Aston_Villa","Man_City","01/22/2011","3.5","3.4","2.1","-1.274314497","1.2743144968" "Blackpool","Sunderland","01/22/2011","2.5","3.25","2.88","1.0883505902","-1.08835059" "Everton","West_Ham","01/22/2011","1.5","4","7.5","-0.740684864","0.7406848636" "Fulham","Stoke","01/22/2011","2.2","3.2","3.5","-0.690559941","0.6905599411" "Man_United","Birmingham","01/22/2011","1.2","6.5","15","0.8826238873","-0.882623887" "Newcastle","Tottenham","01/22/2011","3.5","3.4","2.1","0.9706523784","-0.970652378"
I want to add the values for "HomeTeam_hotness_beforegame" variable and the "AwayTeam_hotness_beforegame" variable to the rows in the first dataset. Doing this by date hometeam awayteam should secure that SAS puts the values for the variable on the correct observation. I've tried using this code:
proc sort data=poisson_final;
by date hometeam awayteam;
run;
proc sort data=PL_score_1;
by date hometeam awayteam;
run;
data results;
merge poisson_final PL_score_1;
by date;
run;
But the output doesn't make sense - see blue box in the picture below. It is like SAS is assigning prior values for "p_home_win", "p_draw" and "p_away_win".
Any suggestions on why the merge doesn't do the job for me here?
Any help is very much appreciated!
Best, Frederik
Variables in incoming datasets are automatically retained; in a MERGE with BY, they are set to missing only when a group change occurs.
So, if you have, for a given BY value, 3 observations in A and 2 in B, the second and third observation in the result will have the data of the second observation from B.
Either contemplate an additional BY variable to prevent a many-to-many merge, or set the variables to missing before the MERGE.
data results;
if 0 then set poisson_final PL_score_1; /* initializes the PDV */
call missing(of _all_);
merge
poisson_final
PL_score_1
;
by date;
run;
Untested, posting from my tablet.
Variables in incoming datasets are automatically retained; in a MERGE with BY, they are set to missing only when a group change occurs.
So, if you have, for a given BY value, 3 observations in A and 2 in B, the second and third observation in the result will have the data of the second observation from B.
Either contemplate an additional BY variable to prevent a many-to-many merge, or set the variables to missing before the MERGE.
data results;
if 0 then set poisson_final PL_score_1; /* initializes the PDV */
call missing(of _all_);
merge
poisson_final
PL_score_1
;
by date;
run;
Untested, posting from my tablet.
Hi Kurt
The code doesn't give me any error messages but looking at the results dataset it seems as some of the values for p_homewin, p_draw, and p_away_win have switched around. Before the merge I had 56.170, 24.346, 19.484 for the Man City - Blackpool game. These values are now shown for Birmingham-Arsenal. These two games were played on the same date - so maybe something goes wrong here?
Appreciates any further suggestions!
Frederik
When you MERGE datasets that contain identically named columns (apart from those in the BY statement), the exact order of how the reads occur will determine which values prevail in the resulting dataset.
So, with a data step MERGE, you should try to
See here
Hi Kurt
I've tried to add "hometeam" and "awayteam" to the by statement giving me this bit of code:
data results;
if 0 then set poisson_final PL_score_1; /* initializes the PDV */
call missing(of _all_);
merge
poisson_final
PL_score_1
;
by date hometeam awayteam;
run;
(…) and this output:
Now it doesn't confuse the p_variables - they stay in the right place. Unfortunately there are a lot of duplicates and the correct value for the hotness_variables shows in the duplicate whitout the p_variables...
/Frederik
Having a second look at my data I discovered that some of the teams - in the HomeTeam and AwayTeam variables - were named slightly different. Fixing this and adding "hometeam" and "awayteam" to the by statement in this bit of code did the job for me!
THANKS once again Kurt!
/Frederik
It is impossible to follow what you are doing. Please create a simplified example with just enough data and variables to replicate the issue.
If you are doing a many to one merge (essentially a look-up) then the NEW variable must not already exist on the many dataset.
data many ;
input id color $ ;
cards;
1 red
1 blue
2 orange
;
data one;
input id color $ ;
cards;
1 green
2 yellow
;
data bad;
merge many one ;
by id;
run;
data good ;
merge many(drop=color) one;
by id;
run;
Hi Tom
Sorry for the confusion. I found out now that Kurt's bit of code above does the job for me! But thanks for your suggestions anyway!!
Best,
Frederik
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.