BookmarkSubscribeRSS Feed
JibJam221
Obsidian | Level 7

hello all, my goal with this code is the have only records from table A to be joined with only records that match the conditions, from table B, as to pull the columns from table B to complete my table. 

 

Here is my code:

 

 

proc sql;
create table CombinedTable as select A.*, B.Phone_number,
B.Email, B.Home_address, B.Secondary address
from TableA as A left join TableB as B
	on A.EmpName=B.Name and A.ZIP=B.ZIP;
	quit;

As you can see, there are two conditions for the join because there are employees with the same name, so we are using EmpName and ZIP to confirm that we are receiving the employee info that belong in TableA.

When I run this, I recieve records from table B that were not originally in Table A. This is an issue as I am only looking for employees in Table A, plus any employees that match the conditions from table B, to have their information joined. 

Im not sure how to fix this or what to do. I though left joins only keep records from Table A, and only takes records that match from Table B

 

5 REPLIES 5
Patrick
Opal | Level 21

"I though left joins only keep records from Table A, and only takes records that match from Table B"

That's correct and what your SQL is doing. There must be something else happening like data replication due to 1:many or even many:many relations that make you believe you're getting rows in the result set that don't exist in your table A.

 

Below SQL might help you to investigate your data further.

proc sql;
  create table investigate as 
    select 
    a.empname as a_empname
    ,a.zip as a_zip
    ,b.empname as b_empname
    ,b.zip as b_zip
    ,count(*) as n_rows
    from 
      tablea as a 
      left join 
      tableb as b
        on 
          a.empname=b.name 
          and a.zip=b.zip
      group by 
        a.empname 
        ,a.zip
        ,b.empname
        ,b.zip
      having count(*)>1
  ;
quit;

 

Below code lets you verify that there are really no rows in your result table CombinedTable that don't have a matching business key to your source TableA. 

Table not_in_a will have zero rows because there won't be any business keys without a match.

data not_in_a;
  if _n_=1 then
    do;
      dcl hash h1(dataset:"TableA");
      h1.defineKey('EmpName','ZIP');
      h1.defineDone();
    end;
  set CombinedTable;
  if h1.check() ne 0 then output;
run;

 

JibJam221
Obsidian | Level 7

is it possible that its pulling other employees with the Zipcodes that match those in TableA, even though the EmpNames arent a match? 

Table B is a database that holds data at a national level for our company, so there are definitely people with the same zipcodes in table B. However table A is only a subset of people with those zip codes. So im wondering if theres maybe a way to pull info based on two conditions that are linked?

i.e. only pull from table B is the employee's name is Sam Smith and his zipcode is XXXX? 

 

because after doing some verification work, I see that table A is now populated with people from Table B, when that isnt the intention

Patrick
Opal | Level 21

"is it possible that its pulling other employees with the Zipcodes that match those in TableA, even though the EmpNames arent a match? "

Not if the code you've shared reflects what you're actually running.

 

If you run the data step code I've shared do you get any rows?

Kurt_Bremser
Super User

@JibJam221 wrote:

 

I see that table A is now populated with people from Table B


This cannot be, as your code creates a new table and does not replace TableA. So the names must have been in TableA all the time.

Tom
Super User Tom
Super User

Nope.

Your output dataset will have values from the B variables listed in the SELECT 

, B.Phone_number
, B.Email
, B.Home_address
, B.Secondary as address

When ever there is an observation that matches A on both the name fields and the ZIP field.

 

Two things to check.

1) Do any of the matches have multiple observations coming from B?  If so that will increase the number of observations written.  For example 2 input duplicate name/zip combos in A that match 3 replicant name/zip combo in B will result in 2*3=6 observations written.

2) Remember that if ZIP is missing in both dataset that is a MATCH.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 522 views
  • 0 likes
  • 4 in conversation