SAS Programming

cadams47 · Posted 05-28-2019 04:33 PM

I use Proc Sql to Left Join in SAS all the time. But the last few months, I feel like the same code I always used makes the dataset explode instead of actually left join. I know this crazy. I am still just using SAS 9.4.

For example,

proc sql;
     create table wrds_cik6
     as select *, b.gvkey as comp_gvkey
     from wrds_cik5 as a left join wrds_cik_a as b
     on a.cik = b.cik;
quit;

wrds_cik5 only has about 180,000 observations. Yet, it returns 300,000,000 observations after this code. Why is it exploding? (I looked on this board for info to an almost identical question earlier, and the answer was, "left join has always done this.")

Reeza · Posted 05-28-2019 04:54 PM

You likely have duplicate CIK records in each table. If you have 5 CIK in Table1 and 3 in Table2 after the left join you'll have 15 records. You likely need to add another filter to the join or someway to reduce those duplicates.

@cadams47 wrote:

I use Proc Sql to Left Join in SAS all the time. But the last few months, I feel like the same code I always used makes the dataset explode instead of actually left join. I know this crazy. I am still just using SAS 9.4.

For example,

proc sql;
     create table wrds_cik6
     as select *, b.gvkey as comp_gvkey
     from wrds_cik5 as a left join wrds_cik_a as b
     on a.cik = b.cik;
quit;

wrds_cik5 only has about 180,000 observations. Yet, it returns 300,000,000 observations after this code. Why is it exploding? (I looked on this board for info to an almost identical question earlier, and the answer was, "left join has always done this.")

Urban_Science · Posted 05-28-2019 05:12 PM

Just like Reeza said, it appears that you have duplicate values for the joined variable in at least one of the tables that you are joining.

Here is some code to show this happening:

data work.have1;
	col1 = 1;
	col2 = 1;
	output;
	col1 = 2;
	col2 = 1;
	output;
	col1 = 3;
	col2 = 1;
	output;
run;

data work.have2;
	col1 = 1;
	col3 = 2;
	output;
	col1 = 2;
	col3 = 2;
	output;
	col1 = 4;
	col3 = 2;
	output;
run;


proc sql;
	create table work.want as 
		select *
			, b.col3
		from work.have1 as a
		left join work.have2 as b
			on a.col1 = b.col3
			;
quit;

Even though we are joining two tables with 3 rows each, the table created has 5 rows.

cadams47 · Posted 05-28-2019 05:58 PM

Ahhh. You're right. I have multiple cik's from my right-side table. Thank you both for the help.

ChrisNZ · Posted 05-28-2019 07:27 PM

Something like this *might* work around your issue, depending on your data.

proc sql;
  create table WRDS_CIK6
  as select c5.*, ca.GVKEY as COMP_GVKEY
  from WRDS_CIK5                                   c5 
         left join 
       (select unique CIK, GVKEY from WRDS_CIK_A)  ca
         on c5.CIK = ca.CIK;
quit;

High-Performance SAS Coding - Third Edition

ScottBass · Posted 05-29-2019 04:38 AM

When I run into this issue, I use this construct in SQL. There are other ways in base SAS such as SORT or FREQ, but I usually revert to this construct because I work directly in the RDBMS a lot.

proc sql;
   create table test as
   select cik, count(0) as count
   from wrds_cik5
   group by cik
   having count(0) > 1;
quit;

proc sql;
   create table test as
   select cik, count(0) as count
   from wrds_cik_a
   group by cik
   having count(0) > 1;
quit;

The output should be zero records, meaning your keys (group by variables) uniquely identify each record.

For a LEFT JOIN, you can have dups in the left table, but if you have dups in the right table, your target table will "explode".

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

SAS Programming

Proc Sql Left Join is exploding

Re: Proc Sql Left Join is exploding

Re: Proc Sql Left Join is exploding

Re: Proc Sql Left Join is exploding

Re: Proc Sql Left Join is exploding

Re: Proc Sql Left Join is exploding

CAS is Fast - Left Join

PROC SQL - Insert

sql nesting multiple left outer joins

proc sql multiplt left joins

Left Join/ Update Problem

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...