Hi,
I am currently using SAS Enterprise Guide, and wish to merge two data sets based on a common variable. I wish to do a left join I believe, as I wish to keep every data row from the left table, and match with the right table to add an extra variable onto the left table. See below for an example
Table1 Table2
Animal Name Age Colour Animal Weight
Dog 1 5 Blue Dog 7
Cat 2 4 Black Cat 9
Fish 3 8 Blue Fish 2
Dog 4 6 Red
Cat 1 2 Blue
And I wish for it to end up as
Merge
Animal Name Age Colour Weight
Dog 1 5 Blue 7
Cat 2 4 Black 9
Fish 3 8 Blue 2
Dog 4 6 Red 7
Cat 1 2 Blue 9
So some data from the right table may have to match up to many on the left.
I am currently using this code :
PROC SQL;
CREATE TABLE Merge AS
SELECT t1.Animal,
t1.Name,
t2.Colour,
FROM Table1 t1
LEFT JOIN Table2 t2 ON (t1.animal = t2.animal);
QUIT;
However when I do this, Instead of keeping the original 5 data entries from table 1, i appear to result in more data entries, and as this is a large data file, i cannot easily single out where or what these extra rows are or where they came from.
Thanks!
The information you provide does not illustrate your problem. Look at the below code (and please pay particular attention to the formatting of the code, and the use of the code window which is the {i} above post area - code readability is very important):
data have1; input animal $ name age colour $; datalines; Dog 1 5 Blue Cat 2 4 Black Fish 3 8 Blue Dog 4 6 Red Cat 1 2 Blue ; run; data have2; input animal $ weight; datalines; Dog 7 Cat 9 Fish 2 ; run; proc sql; create table want as select t1.*, t2.weight from have1 t1 left join have2 t2 on t1.animal=t2.animal; quit;
If you run this code you will see that in want there is the correct number of rows, and is joined correctly, therefore I cannot replicate your problem. What I suspect is happening is that you have multiple records in have 1, which each match one record on the left, thus duplicating out the left rows, i.e. if Dog appeared twice in the have2 dataset, then that would duplicate each instance of Dog on the left.
Please also note how I have written the test data in the form of a datastep - not here to type in test data to answer a question.
Do a data step merge:
proc sort data=table1;
by animal;
run;
proc sort data=table2;
by animal;
run;
data want;
merge
table1 (in=a)
table2 (in=b)
;
by animal;
if a;
run;
and look at the log. If you get a NOTE about more than one dataset with repeated by values, you have found your culprit.
Double-check your TABLE2. Is it possible that it contains two entries for the same ANIMAL? That would create the problem that you are describing.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.