- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I use Proc Sql to Left Join in SAS all the time. But the last few months, I feel like the same code I always used makes the dataset explode instead of actually left join. I know this crazy. I am still just using SAS 9.4.
For example,
proc sql;
create table wrds_cik6
as select *, b.gvkey as comp_gvkey
from wrds_cik5 as a left join wrds_cik_a as b
on a.cik = b.cik;
quit;
wrds_cik5 only has about 180,000 observations. Yet, it returns 300,000,000 observations after this code. Why is it exploding? (I looked on this board for info to an almost identical question earlier, and the answer was, "left join has always done this.")
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You likely have duplicate CIK records in each table. If you have 5 CIK in Table1 and 3 in Table2 after the left join you'll have 15 records. You likely need to add another filter to the join or someway to reduce those duplicates.
@cadams47 wrote:
I use Proc Sql to Left Join in SAS all the time. But the last few months, I feel like the same code I always used makes the dataset explode instead of actually left join. I know this crazy. I am still just using SAS 9.4.
For example,
proc sql;
create table wrds_cik6
as select *, b.gvkey as comp_gvkey
from wrds_cik5 as a left join wrds_cik_a as b
on a.cik = b.cik;
quit;
wrds_cik5 only has about 180,000 observations. Yet, it returns 300,000,000 observations after this code. Why is it exploding? (I looked on this board for info to an almost identical question earlier, and the answer was, "left join has always done this.")
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just like Reeza said, it appears that you have duplicate values for the joined variable in at least one of the tables that you are joining.
Here is some code to show this happening:
data work.have1;
col1 = 1;
col2 = 1;
output;
col1 = 2;
col2 = 1;
output;
col1 = 3;
col2 = 1;
output;
run;
data work.have2;
col1 = 1;
col3 = 2;
output;
col1 = 2;
col3 = 2;
output;
col1 = 4;
col3 = 2;
output;
run;
proc sql;
create table work.want as
select *
, b.col3
from work.have1 as a
left join work.have2 as b
on a.col1 = b.col3
;
quit;
Even though we are joining two tables with 3 rows each, the table created has 5 rows.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ahhh. You're right. I have multiple cik's from my right-side table. Thank you both for the help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Something like this *might* work around your issue, depending on your data.
proc sql;
create table WRDS_CIK6
as select c5.*, ca.GVKEY as COMP_GVKEY
from WRDS_CIK5 c5
left join
(select unique CIK, GVKEY from WRDS_CIK_A) ca
on c5.CIK = ca.CIK;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When I run into this issue, I use this construct in SQL. There are other ways in base SAS such as SORT or FREQ, but I usually revert to this construct because I work directly in the RDBMS a lot.
proc sql;
create table test as
select cik, count(0) as count
from wrds_cik5
group by cik
having count(0) > 1;
quit;
proc sql;
create table test as
select cik, count(0) as count
from wrds_cik_a
group by cik
having count(0) > 1;
quit;
The output should be zero records, meaning your keys (group by variables) uniquely identify each record.
For a LEFT JOIN, you can have dups in the left table, but if you have dups in the right table, your target table will "explode".
Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.