BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Harmandeep
Fluorite | Level 6

Hi,

 

I am trying to Merge the two data sets

Data example1;
Input company$ employee age amount prob month;
datalines;
ABC 01 24 50000 90 5
ABC 01 24 0 10 7
;
data example2;
Input company$ employee age amount prob month;
datalines;
ABC 03 25 21000 90 5
ABC 03 25 0 10 7
;
run;

Proc sql;
create table combined as
select a.company,a.employee,a.amount as ex1_amount,b.amount as ex2_amount,a.prob,a.month
from example1 as a left join example2 as b on a.company=b.company;
quit;

Expected Output:

 

ABC 1 50000 21000 90 5

ABC 1 0 0 10 7

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

May be you want to explicitly say that the data should also match on month?

 

Proc sql;
   create table combined as
   select a.company,a.employee,a.amount as ex1_amount, b.amount as ex2_amount,a.prob,a.month
   from example1 as a left join example2 as b 
        on    a.company =b.company
          and a.month   =b.month
 ;
quit;

Though I have sneeking feeling that since you are ignoring the employee value that something else is going on and you are possibly taking a complex approach to transposition

 

View solution in original post

4 REPLIES 4
Astounding
PROC Star

Your expectations about SQL are incorrect.  SQL does not match observation by observation.  Instead, it finds all matches, so you will get:

 

First ABC from EXAMPLE1, matched with first ABC from EXAMPLE2

First ABC from EXAMPLE1, matched with second ABC from EXAMPLE2

Second ABC from EXAMPLE1, matched with first ABC from EXAMPLE2

Second ABC from EXAMPLE1, matched with second ABC from EXAMPLE2

 

SQL is just the wrong tool for the job, to produce your expected output.

Reeza
Super User

This is a many to many join, which ends up with 2X2 -> 4 records which is not what you want. 

 

I would suggest a data step merge instead. 

ballardw
Super User

May be you want to explicitly say that the data should also match on month?

 

Proc sql;
   create table combined as
   select a.company,a.employee,a.amount as ex1_amount, b.amount as ex2_amount,a.prob,a.month
   from example1 as a left join example2 as b 
        on    a.company =b.company
          and a.month   =b.month
 ;
quit;

Though I have sneeking feeling that since you are ignoring the employee value that something else is going on and you are possibly taking a complex approach to transposition

 

TomKari
Onyx | Level 15

The other responders have answered your question, but just to help with your knowledge I'm going to add this, which I've used with a number of colleagues over the years.

 

A SQL join starts out (in the case of a two table join) by matching every record with the first table to every record from the second table. So, as @Astounding and @Reeza explain in the case of two tables with two records each, you start out with 2 x 2 or 4 result records.

 

There are actually cases where this is useful, but they are rare. So to get the desired results, you use other SQL clauses to "trim away" the result records you don't want.

 

The first is the left / right / inner join phrasing, accompanied by an "on" clause, where you're telling SQL that out of the enormous number of result records, ONLY keep the ones where there's a record in the left or the right table, or only the records where a field from the left table matches a field from the right table. This usually reduces the number of result records from a x b to the record count of a or b, or less depending on matching, which is usually more in line with what you want.

 

Once SQL has reduced the number of records based on your join logic, you can then reduce it even further using the "where" clause, which tells SQL to additionally only keep the records from the first part that match the conditions that you specify.

 

So, in conclusion, I always consider a SQL join as creating a massive result set, which I then trim away using the different language options.

 

Hope this helps,
   Tom

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 951 views
  • 2 likes
  • 5 in conversation