DATA Step, Macro, Functions and more

Merging/Joining Datasets

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Merging/Joining Datasets

Hi,

 

I am trying to Merge the two data sets

Data example1;
Input company$ employee age amount prob month;
datalines;
ABC 01 24 50000 90 5
ABC 01 24 0 10 7
;
data example2;
Input company$ employee age amount prob month;
datalines;
ABC 03 25 21000 90 5
ABC 03 25 0 10 7
;
run;

Proc sql;
create table combined as
select a.company,a.employee,a.amount as ex1_amount,b.amount as ex2_amount,a.prob,a.month
from example1 as a left join example2 as b on a.company=b.company;
quit;

Expected Output:

 

ABC 1 50000 21000 90 5

ABC 1 0 0 10 7

 

 


Accepted Solutions
Solution
‎08-25-2017 01:58 PM
Super User
Posts: 13,950

Re: Merging/Joining Datasets

Posted in reply to Harmandeep

May be you want to explicitly say that the data should also match on month?

 

Proc sql;
   create table combined as
   select a.company,a.employee,a.amount as ex1_amount, b.amount as ex2_amount,a.prob,a.month
   from example1 as a left join example2 as b 
        on    a.company =b.company
          and a.month   =b.month
 ;
quit;

Though I have sneeking feeling that since you are ignoring the employee value that something else is going on and you are possibly taking a complex approach to transposition

 

View solution in original post


All Replies
Super User
Posts: 6,939

Re: Merging/Joining Datasets

Posted in reply to Harmandeep

Your expectations about SQL are incorrect.  SQL does not match observation by observation.  Instead, it finds all matches, so you will get:

 

First ABC from EXAMPLE1, matched with first ABC from EXAMPLE2

First ABC from EXAMPLE1, matched with second ABC from EXAMPLE2

Second ABC from EXAMPLE1, matched with first ABC from EXAMPLE2

Second ABC from EXAMPLE1, matched with second ABC from EXAMPLE2

 

SQL is just the wrong tool for the job, to produce your expected output.

Super User
Posts: 24,027

Re: Merging/Joining Datasets

Posted in reply to Harmandeep

This is a many to many join, which ends up with 2X2 -> 4 records which is not what you want. 

 

I would suggest a data step merge instead. 

Solution
‎08-25-2017 01:58 PM
Super User
Posts: 13,950

Re: Merging/Joining Datasets

Posted in reply to Harmandeep

May be you want to explicitly say that the data should also match on month?

 

Proc sql;
   create table combined as
   select a.company,a.employee,a.amount as ex1_amount, b.amount as ex2_amount,a.prob,a.month
   from example1 as a left join example2 as b 
        on    a.company =b.company
          and a.month   =b.month
 ;
quit;

Though I have sneeking feeling that since you are ignoring the employee value that something else is going on and you are possibly taking a complex approach to transposition

 

PROC Star
Posts: 1,334

Re: Merging/Joining Datasets

The other responders have answered your question, but just to help with your knowledge I'm going to add this, which I've used with a number of colleagues over the years.

 

A SQL join starts out (in the case of a two table join) by matching every record with the first table to every record from the second table. So, as @Astounding and @Reeza explain in the case of two tables with two records each, you start out with 2 x 2 or 4 result records.

 

There are actually cases where this is useful, but they are rare. So to get the desired results, you use other SQL clauses to "trim away" the result records you don't want.

 

The first is the left / right / inner join phrasing, accompanied by an "on" clause, where you're telling SQL that out of the enormous number of result records, ONLY keep the ones where there's a record in the left or the right table, or only the records where a field from the left table matches a field from the right table. This usually reduces the number of result records from a x b to the record count of a or b, or less depending on matching, which is usually more in line with what you want.

 

Once SQL has reduced the number of records based on your join logic, you can then reduce it even further using the "where" clause, which tells SQL to additionally only keep the records from the first part that match the conditions that you specify.

 

So, in conclusion, I always consider a SQL join as creating a massive result set, which I then trim away using the different language options.

 

Hope this helps,
   Tom

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 264 views
  • 2 likes
  • 5 in conversation