BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Taasby
Calcite | Level 5

Hi Everyone

 

I have two datasets which I aim to merge.

 

First dataset looks like this (some of it):

 

"HomeTeam","AwayTeam","Date","B365H","B365D","B365A","p_home_win","p_draw","p_away_win"
"Man City","Blackpool","01/01/2011","1.29","5.5","10","56.170","24.346","19.484"
"Stoke","Everton","01/01/2011","2.63","3.2","2.75","21.027","27.827","51.146"
"Sunderland","Blackburn","01/01/2011","1.83","3.5","4.5","60.521","24.321","15.158"
"Fulham","West Brom","01/04/2011","2.2","3.25","3.4","24.902","23.708","51.390"
"Aston Villa","Sunderland","01/05/2011","2.2","3.25","3.4","69.424","16.212","14.353"
"Newcastle","West Ham","01/05/2011","1.83","3.6","4.33","58.885","18.826","22.284"
"Wolves","Chelsea","01/05/2011","8","4.2","1.44","66.943","16.625","16.419"
"Chelsea","Blackburn","01/15/2011","1.29","5.5","11","89.759","6.747","3.281"
"Man City","Wolves","01/15/2011","1.33","5","10","45.785","27.370","26.845"
"Stoke","Bolton","01/15/2011","2.1","3.25","3.75","56.967","24.800","18.233"
"West Brom","Blackpool","01/15/2011","1.8","3.6","4.5","59.517","18.727","21.751"
"West Ham","Arsenal","01/15/2011","6","4","1.57","6.850","15.515","77.632"
"Wigan","Fulham","01/15/2011","2.5","3.25","2.88","20.793","27.330","51.877"
"Birmingham","Aston Villa","01/16/2011","2.4","3.2","3.1","23.311","29.941","46.747"
"Liverpool","Everton","01/16/2011","2.2","3.2","3.5","64.537","23.468","11.994"
"Sunderland","Newcastle","01/16/2011","2.2","3.25","3.4","44.228","33.868","21.903"
"Arsenal","Wigan","01/22/2011","1.2","6.5","15","49.116","23.552","27.332"
"Everton","West Ham","01/22/2011","1.5","4","7.5","52.747","25.312","21.941"
"Fulham","Stoke","01/22/2011","2.2","3.2","3.5","71.301","20.783","7.916"
"Man United","Birmingham","01/22/2011","1.2","6.5","15","63.496","23.443","13.062"
"Newcastle","Tottenham","01/22/2011","3.5","3.4","2.1","33.698","26.365","39.936"
"Wolves","Liverpool","01/22/2011","3.4","3.2","2.25","9.437","15.631","74.927"
"Blackburn","West Brom","01/23/2011","2.25","3.3","3.25","53.162","29.265","17.574"
"Bolton","Chelsea","01/24/2011","6","3.75","1.62","29.028","23.656","47.317"
"Wigan","Aston Villa","01/25/2011","3.2","3.25","2.3","53.062","23.313","23.625"
"Liverpool","Fulham","01/26/2011","1.53","3.8","7","73.975","16.967","9.056"
"Arsenal","Everton","02/01/2011","1.4","4.5","8.5","51.904","21.570","26.525"
"Man United","Aston Villa","02/01/2011","1.33","5","9.5","67.411","23.158","9.431"
"Sunderland","Chelsea","02/01/2011","5.5","3.6","1.67","19.029","33.295","47.675"

 

 

Second dataset looks like this (some of it):

 

"HomeTeam","AwayTeam","Date","B365H","B365D","B365A","HomeTeam_hotness_beforegame","AwayTeam_hotness_beforegame"
"Arsenal","Chelsea","12/27/2010","2.63","3.2","2.75","3.7604757667","-3.760475767"
"Birmingham","Man_United","12/28/2010","6.5","4.33","1.5","-0.63663219","0.6366321898"
"Man_City","Aston_Villa","12/28/2010","1.44","4.2","8","1.6836796735","-1.683679673"
"Stoke","Fulham","12/28/2010","1.83","3.4","4.75","2.9566006505","-2.956600651"
"Sunderland","Blackpool","12/28/2010","1.62","3.75","6","-0.723181362","0.7231813625"
"Tottenham","Newcastle","12/28/2010","1.53","4","6.5","2.2847551983","-2.284755198"
"West_Brom","Blackburn","12/28/2010","1.83","3.5","4.5","0.382658028","-0.382658028"
"West_Ham","Everton","12/28/2010","3.4","3.2","2.25","1.7574369645","-1.757436964"
"Chelsea","Bolton","12/29/2010","1.29","5.5","11","-3.928073866","3.9280738659"
"Liverpool","Wolves","12/29/2010","1.33","5","10","-0.822763551","0.8227635506"
"Wigan","Arsenal","12/29/2010","6.5","4.2","1.5","-0.538853648","0.5388536475"
"Birmingham","Arsenal","01/01/2011","5.5","3.6","1.67","-0.181423191","0.1814231907"
"Liverpool","Bolton","01/01/2011","1.7","3.75","5","-1.100856462","1.1008564622"
"Man_City","Blackpool","01/01/2011","1.29","5.5","10","-1.022399727","1.0223997272"
"Stoke","Everton","01/01/2011","2.63","3.2","2.75","1.0295768053","-1.029576805"
"Sunderland","Blackburn","01/01/2011","1.83","3.5","4.5","-0.249932422","0.249932422"
"Tottenham","Fulham","01/01/2011","1.4","4.5","8.5","2.4818040774","-2.481804077"
"West_Brom","Man_United","01/01/2011","6","4.2","1.53","-0.026691257","0.026691257"
"West_Ham","Wolves","01/01/2011","2","3.4","3.8","-0.774912117","0.7749121174"
"Chelsea","Aston_Villa","01/02/2011","1.25","5.5","13","-1.730987424","1.7309874237"
"Wigan","Newcastle","01/02/2011","2.5","3.2","2.9","0.264612696","-0.264612696"
"Blackpool","Birmingham","01/04/2011","2.3","3.25","3.25","2.1186630801","-2.11866308"
"Fulham","West_Brom","01/04/2011","2.2","3.25","3.4","-1.037093869","1.0370938694"
"Man_United","Stoke","01/04/2011","1.25","5.75","12","1.3149114291","-1.314911429"
"Arsenal","Man_City","01/05/2011","1.91","3.6","4","-0.441275477","0.4412754771"
"Aston_Villa","Sunderland","01/05/2011","2.2","3.25","3.4","-0.898485414","0.8984854145"
"Blackburn","Liverpool","01/05/2011","3.4","3.25","2.2","0.0970333453","-0.097033345"
"Bolton","Wigan","01/05/2011","1.73","3.6","5","0.8621463159","-0.862146316"
"Everton","Tottenham","01/05/2011","2.63","3.25","2.75","-2.335397536","2.3353975359"
"Newcastle","West_Ham","01/05/2011","1.83","3.6","4.33","0.1260786995","-0.1260787"
"Wolves","Chelsea","01/05/2011","8","4.2","1.44","3.7990423986","-3.799042399"
"Blackpool","Liverpool","01/12/2011","3.8","3.75","1.91","2.705250589","-2.705250589"
"Chelsea","Blackburn","01/15/2011","1.29","5.5","11","-2.77646534","2.7764653398"
"Man_City","Wolves","01/15/2011","1.33","5","10","-0.716695062","0.7166950617"
"Stoke","Bolton","01/15/2011","2.1","3.25","3.75","-0.184180716","0.1841807157"
"West_Brom","Blackpool","01/15/2011","1.8","3.6","4.5","-2.662987004","2.6629870041"
"West_Ham","Arsenal","01/15/2011","6","4","1.57","0.4998071633","-0.499807163"
"Wigan","Fulham","01/15/2011","2.5","3.25","2.88","-0.658881604","0.6588816037"
"Birmingham","Aston_Villa","01/16/2011","2.4","3.2","3.1","0.0981564779","-0.098156478"
"Liverpool","Everton","01/16/2011","2.2","3.2","3.5","-1.220109433","1.220109433"
"Sunderland","Newcastle","01/16/2011","2.2","3.25","3.4","-0.381697342","0.3816973421"
"Tottenham","Man_United","01/16/2011","2.88","3.3","2.5","-0.808198231","0.808198231"
"Arsenal","Wigan","01/22/2011","1.2","6.5","15","0.4748149132","-0.474814913"
"Aston_Villa","Man_City","01/22/2011","3.5","3.4","2.1","-1.274314497","1.2743144968"
"Blackpool","Sunderland","01/22/2011","2.5","3.25","2.88","1.0883505902","-1.08835059"
"Everton","West_Ham","01/22/2011","1.5","4","7.5","-0.740684864","0.7406848636"
"Fulham","Stoke","01/22/2011","2.2","3.2","3.5","-0.690559941","0.6905599411"
"Man_United","Birmingham","01/22/2011","1.2","6.5","15","0.8826238873","-0.882623887"
"Newcastle","Tottenham","01/22/2011","3.5","3.4","2.1","0.9706523784","-0.970652378"

 

 

I want to add the values for "HomeTeam_hotness_beforegame" variable and the  "AwayTeam_hotness_beforegame" variable to the rows in the first dataset. Doing this by date hometeam awayteam should secure that SAS puts the values for the variable on the correct observation. I've tried using this code:

 

 

proc sort data=poisson_final;
by date hometeam awayteam;
run;

proc sort data=PL_score_1;
by date hometeam awayteam;
run;

data results;
merge poisson_final PL_score_1;
by date;
run;

 

But the output doesn't make sense - see blue box in the picture below. It is like SAS is assigning prior values for "p_home_win", "p_draw" and "p_away_win".

 

image.png

 

Any suggestions on why the merge doesn't do the job for me here?

 

Any help is very much appreciated!

 

Best, Frederik

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Variables in incoming datasets are automatically retained; in a MERGE with BY, they are set to missing only when a group change occurs.

So, if you have, for a given BY value, 3 observations in A and 2 in B, the second and third observation in the result will have the data of the second observation from B.

Either contemplate an additional BY variable to prevent a many-to-many merge, or set the variables to missing before the MERGE.

data results;
if 0 then set poisson_final PL_score_1; /* initializes the PDV */
call missing(of _all_);
merge
  poisson_final
  PL_score_1
;
by date;
run;

Untested, posting from my tablet.

View solution in original post

8 REPLIES 8
Kurt_Bremser
Super User

Variables in incoming datasets are automatically retained; in a MERGE with BY, they are set to missing only when a group change occurs.

So, if you have, for a given BY value, 3 observations in A and 2 in B, the second and third observation in the result will have the data of the second observation from B.

Either contemplate an additional BY variable to prevent a many-to-many merge, or set the variables to missing before the MERGE.

data results;
if 0 then set poisson_final PL_score_1; /* initializes the PDV */
call missing(of _all_);
merge
  poisson_final
  PL_score_1
;
by date;
run;

Untested, posting from my tablet.

Taasby
Calcite | Level 5

Hi Kurt

 

The code doesn't give me any error messages but looking at the results dataset it seems as some of the values for p_homewin, p_draw, and p_away_win have switched around. Before the merge I had 56.170, 24.346, 19.484 for the Man City - Blackpool game. These values are now shown for Birmingham-Arsenal. These two games were played on the same date - so maybe something goes wrong here?

image.png

 

Appreciates any further suggestions!

 

Frederik

 

Kurt_Bremser
Super User

When you MERGE datasets that contain identically named columns (apart from those in the BY statement), the exact order of how the reads occur will determine which values prevail in the resulting dataset.

So, with a data step MERGE, you should try to

  • avoid a many-to-many join
  • rename or drop variables (with dataset options) to avoid the collisions

See here

Taasby
Calcite | Level 5

Hi Kurt

 

I've tried to add "hometeam" and "awayteam" to the by statement giving me this bit of code:

data results;
if 0 then set poisson_final PL_score_1; /* initializes the PDV */
call missing(of _all_);
merge
  poisson_final
  PL_score_1
;
by date hometeam awayteam;
run;

(…) and this output:

image.png

 

Now it doesn't confuse the p_variables - they stay in the right place. Unfortunately there are a lot of duplicates and the correct value for the hotness_variables shows in the duplicate whitout the p_variables...

 

/Frederik

Reeza
Super User
If you're doing a many to many join you really don't want to use a data step, you should use SQL. It's too hard to understand/control and you get unexpected behaviour. In general for joins, I find SQL better 90% of the time.
Taasby
Calcite | Level 5

Having a second look at my data I discovered that some of the teams - in the HomeTeam and AwayTeam variables - were named slightly different. Fixing this and adding "hometeam" and "awayteam" to the by statement in this bit of code did the job for me!

 

THANKS once again Kurt!

 

/Frederik

Tom
Super User Tom
Super User

It is impossible to follow what you are doing. Please create a simplified example with just enough data and variables to replicate the issue.

 

If you are doing a many to one merge (essentially a look-up) then the NEW variable must not already exist on the many dataset.

 

data many ;
  input id color $ ;
cards;
1 red
1 blue 
2 orange
;
data one;
  input id color $ ;
cards;
1 green 
2 yellow 
;
data bad;
  merge many one ;
  by id;
run;

data good ;
  merge many(drop=color) one;
  by id;
run;

 

Taasby
Calcite | Level 5

Hi Tom

 

Sorry for the confusion. I found out now that Kurt's bit of code above does the job for me! But thanks for your suggestions anyway!!

 

Best,

Frederik

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1428 views
  • 2 likes
  • 4 in conversation