BookmarkSubscribeRSS Feed
Jmills
Calcite | Level 5

Hi,

 

I am currently using SAS Enterprise Guide, and wish to merge two data sets based on a common variable. I wish to do a left join I believe, as I wish to keep every data row from the left table, and match with the right table to add an extra variable onto the left table. See below for an example

 

Table1                                                 Table2

 

Animal  Name  Age  Colour                 Animal  Weight

Dog       1         5       Blue                    Dog      7

Cat        2         4       Black                  Cat       9

Fish      3          8       Blue                   Fish      2

Dog      4          6       Red     

Cat       1          2       Blue

 

 

And I wish for it to end up as 

 

Merge 

 

Animal  Name  Age  Colour  Weight

Dog       1         5       Blue       7

Cat        2         4       Black     9

Fish      3          8       Blue        2

Dog      4          6       Red       7  

Cat       1          2       Blue      9

 

 

So some data from the right table may have to match up to many on the left.

 

I am currently using this code :

 

PROC SQL;
CREATE TABLE Merge AS
SELECT t1.Animal,
t1.Name,
t2.Colour,
FROM Table1 t1
LEFT JOIN Table2 t2 ON (t1.animal = t2.animal);
QUIT;

 

 

However when I do this, Instead of keeping the original 5 data entries from table 1, i appear to result in more data entries, and as this is a large data file, i cannot easily single out where or what these extra rows are or where they came from.

 

Thanks!

3 REPLIES 3
RW9
Diamond | Level 26 RW9
Diamond | Level 26

The information you provide does not illustrate your problem.  Look at the below code (and please pay particular attention to the formatting of the code, and the use of the code window which is the {i} above post area - code readability is very important):

data have1;
  input animal $ name age colour $;         
datalines;
Dog 1 5 Blue                    
Cat 2 4 Black                  
Fish 3 8 Blue                   
Dog 4 6 Red     
Cat 1 2 Blue
;
run;

data have2;
  input animal $ weight;
datalines;
Dog 7
Cat 9
Fish 2
;
run;

proc sql;
  create table want as
  select t1.*,
         t2.weight
  from   have1 t1
  left join have2 t2
  on     t1.animal=t2.animal;
quit;

If you run this code you will see that in want there is the correct number of rows, and is joined correctly, therefore I cannot replicate your problem.  What I suspect is happening is that you have multiple records in have 1, which each match one record on the left, thus duplicating out the left rows, i.e. if Dog appeared twice in the have2 dataset, then that would duplicate each instance of Dog on the left.

Please also note how I have written the test data in the form of a datastep - not here to type in test data to answer a question.

 

Kurt_Bremser
Super User

Do a data step merge:

proc sort data=table1;
by animal;
run;

proc sort data=table2;
by animal;
run;

data want;
merge
  table1 (in=a)
  table2 (in=b)
;
by animal;
if a;
run;

and look at the log. If you get a NOTE about more than one dataset with repeated by values, you have found your culprit.

Astounding
PROC Star

Double-check your TABLE2.  Is it possible that it contains two entries for the same ANIMAL?  That would create the problem that you are describing.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 651 views
  • 2 likes
  • 4 in conversation