BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Harmandeep
Fluorite | Level 6

Hi,

 

I am trying to Merge the two data sets

Data example1;
Input company$ employee age amount prob month;
datalines;
ABC 01 24 50000 90 5
ABC 01 24 0 10 7
;
data example2;
Input company$ employee age amount prob month;
datalines;
ABC 03 25 21000 90 5
ABC 03 25 0 10 7
;
run;

Proc sql;
create table combined as
select a.company,a.employee,a.amount as ex1_amount,b.amount as ex2_amount,a.prob,a.month
from example1 as a left join example2 as b on a.company=b.company;
quit;

Expected Output:

 

ABC 1 50000 21000 90 5

ABC 1 0 0 10 7

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

May be you want to explicitly say that the data should also match on month?

 

Proc sql;
   create table combined as
   select a.company,a.employee,a.amount as ex1_amount, b.amount as ex2_amount,a.prob,a.month
   from example1 as a left join example2 as b 
        on    a.company =b.company
          and a.month   =b.month
 ;
quit;

Though I have sneeking feeling that since you are ignoring the employee value that something else is going on and you are possibly taking a complex approach to transposition

 

View solution in original post

4 REPLIES 4
Astounding
PROC Star

Your expectations about SQL are incorrect.  SQL does not match observation by observation.  Instead, it finds all matches, so you will get:

 

First ABC from EXAMPLE1, matched with first ABC from EXAMPLE2

First ABC from EXAMPLE1, matched with second ABC from EXAMPLE2

Second ABC from EXAMPLE1, matched with first ABC from EXAMPLE2

Second ABC from EXAMPLE1, matched with second ABC from EXAMPLE2

 

SQL is just the wrong tool for the job, to produce your expected output.

Reeza
Super User

This is a many to many join, which ends up with 2X2 -> 4 records which is not what you want. 

 

I would suggest a data step merge instead. 

ballardw
Super User

May be you want to explicitly say that the data should also match on month?

 

Proc sql;
   create table combined as
   select a.company,a.employee,a.amount as ex1_amount, b.amount as ex2_amount,a.prob,a.month
   from example1 as a left join example2 as b 
        on    a.company =b.company
          and a.month   =b.month
 ;
quit;

Though I have sneeking feeling that since you are ignoring the employee value that something else is going on and you are possibly taking a complex approach to transposition

 

TomKari
Onyx | Level 15

The other responders have answered your question, but just to help with your knowledge I'm going to add this, which I've used with a number of colleagues over the years.

 

A SQL join starts out (in the case of a two table join) by matching every record with the first table to every record from the second table. So, as @Astounding and @Reeza explain in the case of two tables with two records each, you start out with 2 x 2 or 4 result records.

 

There are actually cases where this is useful, but they are rare. So to get the desired results, you use other SQL clauses to "trim away" the result records you don't want.

 

The first is the left / right / inner join phrasing, accompanied by an "on" clause, where you're telling SQL that out of the enormous number of result records, ONLY keep the ones where there's a record in the left or the right table, or only the records where a field from the left table matches a field from the right table. This usually reduces the number of result records from a x b to the record count of a or b, or less depending on matching, which is usually more in line with what you want.

 

Once SQL has reduced the number of records based on your join logic, you can then reduce it even further using the "where" clause, which tells SQL to additionally only keep the records from the first part that match the conditions that you specify.

 

So, in conclusion, I always consider a SQL join as creating a massive result set, which I then trim away using the different language options.

 

Hope this helps,
   Tom

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 937 views
  • 2 likes
  • 5 in conversation