I use Proc Sql to Left Join in SAS all the time. But the last few months, I feel like the same code I always used makes the dataset explode instead of actually left join. I know this crazy. I am still just using SAS 9.4.
For example,
proc sql;
create table wrds_cik6
as select *, b.gvkey as comp_gvkey
from wrds_cik5 as a left join wrds_cik_a as b
on a.cik = b.cik;
quit;
wrds_cik5 only has about 180,000 observations. Yet, it returns 300,000,000 observations after this code. Why is it exploding? (I looked on this board for info to an almost identical question earlier, and the answer was, "left join has always done this.")
You likely have duplicate CIK records in each table. If you have 5 CIK in Table1 and 3 in Table2 after the left join you'll have 15 records. You likely need to add another filter to the join or someway to reduce those duplicates.
@cadams47 wrote:
I use Proc Sql to Left Join in SAS all the time. But the last few months, I feel like the same code I always used makes the dataset explode instead of actually left join. I know this crazy. I am still just using SAS 9.4.
For example,
proc sql;
create table wrds_cik6
as select *, b.gvkey as comp_gvkey
from wrds_cik5 as a left join wrds_cik_a as b
on a.cik = b.cik;
quit;
wrds_cik5 only has about 180,000 observations. Yet, it returns 300,000,000 observations after this code. Why is it exploding? (I looked on this board for info to an almost identical question earlier, and the answer was, "left join has always done this.")
Just like Reeza said, it appears that you have duplicate values for the joined variable in at least one of the tables that you are joining.
Here is some code to show this happening:
data work.have1;
col1 = 1;
col2 = 1;
output;
col1 = 2;
col2 = 1;
output;
col1 = 3;
col2 = 1;
output;
run;
data work.have2;
col1 = 1;
col3 = 2;
output;
col1 = 2;
col3 = 2;
output;
col1 = 4;
col3 = 2;
output;
run;
proc sql;
create table work.want as
select *
, b.col3
from work.have1 as a
left join work.have2 as b
on a.col1 = b.col3
;
quit;
Even though we are joining two tables with 3 rows each, the table created has 5 rows.
Ahhh. You're right. I have multiple cik's from my right-side table. Thank you both for the help.
Something like this *might* work around your issue, depending on your data.
proc sql;
create table WRDS_CIK6
as select c5.*, ca.GVKEY as COMP_GVKEY
from WRDS_CIK5 c5
left join
(select unique CIK, GVKEY from WRDS_CIK_A) ca
on c5.CIK = ca.CIK;
quit;
When I run into this issue, I use this construct in SQL. There are other ways in base SAS such as SORT or FREQ, but I usually revert to this construct because I work directly in the RDBMS a lot.
proc sql;
create table test as
select cik, count(0) as count
from wrds_cik5
group by cik
having count(0) > 1;
quit;
proc sql;
create table test as
select cik, count(0) as count
from wrds_cik_a
group by cik
having count(0) > 1;
quit;
The output should be zero records, meaning your keys (group by variables) uniquely identify each record.
For a LEFT JOIN, you can have dups in the left table, but if you have dups in the right table, your target table will "explode".
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.