Hi,
I am trying to Merge the two data sets
Data example1;
Input company$ employee age amount prob month;
datalines;
ABC 01 24 50000 90 5
ABC 01 24 0 10 7
;
data example2;
Input company$ employee age amount prob month;
datalines;
ABC 03 25 21000 90 5
ABC 03 25 0 10 7
;
run;
Proc sql;
create table combined as
select a.company,a.employee,a.amount as ex1_amount,b.amount as ex2_amount,a.prob,a.month
from example1 as a left join example2 as b on a.company=b.company;
quit;
Expected Output:
ABC 1 50000 21000 90 5
ABC 1 0 0 10 7
May be you want to explicitly say that the data should also match on month?
Proc sql; create table combined as select a.company,a.employee,a.amount as ex1_amount, b.amount as ex2_amount,a.prob,a.month from example1 as a left join example2 as b on a.company =b.company and a.month =b.month ; quit;
Though I have sneeking feeling that since you are ignoring the employee value that something else is going on and you are possibly taking a complex approach to transposition
Your expectations about SQL are incorrect. SQL does not match observation by observation. Instead, it finds all matches, so you will get:
First ABC from EXAMPLE1, matched with first ABC from EXAMPLE2
First ABC from EXAMPLE1, matched with second ABC from EXAMPLE2
Second ABC from EXAMPLE1, matched with first ABC from EXAMPLE2
Second ABC from EXAMPLE1, matched with second ABC from EXAMPLE2
SQL is just the wrong tool for the job, to produce your expected output.
This is a many to many join, which ends up with 2X2 -> 4 records which is not what you want.
I would suggest a data step merge instead.
May be you want to explicitly say that the data should also match on month?
Proc sql; create table combined as select a.company,a.employee,a.amount as ex1_amount, b.amount as ex2_amount,a.prob,a.month from example1 as a left join example2 as b on a.company =b.company and a.month =b.month ; quit;
Though I have sneeking feeling that since you are ignoring the employee value that something else is going on and you are possibly taking a complex approach to transposition
The other responders have answered your question, but just to help with your knowledge I'm going to add this, which I've used with a number of colleagues over the years.
A SQL join starts out (in the case of a two table join) by matching every record with the first table to every record from the second table. So, as @Astounding and @Reeza explain in the case of two tables with two records each, you start out with 2 x 2 or 4 result records.
There are actually cases where this is useful, but they are rare. So to get the desired results, you use other SQL clauses to "trim away" the result records you don't want.
The first is the left / right / inner join phrasing, accompanied by an "on" clause, where you're telling SQL that out of the enormous number of result records, ONLY keep the ones where there's a record in the left or the right table, or only the records where a field from the left table matches a field from the right table. This usually reduces the number of result records from a x b to the record count of a or b, or less depending on matching, which is usually more in line with what you want.
Once SQL has reduced the number of records based on your join logic, you can then reduce it even further using the "where" clause, which tells SQL to additionally only keep the records from the first part that match the conditions that you specify.
So, in conclusion, I always consider a SQL join as creating a massive result set, which I then trim away using the different language options.
Hope this helps,
Tom
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.