I have 2 files
sortcompustat-contains financial data (65929 obs)
sortedmain_vwretd200-contains calculated indexes (53539 obs)
I want to merge these 2 files, using their permno (company code) and financial year.
lpermno of sortcompustat = permno of sortedmain_vwretd200
fyear of both files are the same.
I have used the following proc sql steps, however the output adv_vwretd has 463498 observations having duplicate merging:
proc sql;
create table adv_vwretd as
select a.* , b.*
from sorted_main_vwretd200 a left join sortcompustat b
on a.permno=b.lpermno
order by permno, fyear;
run;
how do i merge the two files with 1 permno for each financial year would contain 1 observation?
Hi mei,
If you got the correct result you can ignore the warnings,if not you need to code like this..
proc sql;
create table adv_vwretd as
select a.* , b.AT,b.CIK,b.CONM,b.CUSIP,b.DATADATE,b.GVKEY,b.LPERMCO,b.LPERMNO,b.REVT,b.XAD
from tt.sorted_main_vwretd200 a left join tt.sortcompustat b
on a.permno=b.lpermno and a.fyear=b.fyear
order by a.permno, a.fyear;
quit;
Thanks,
Shiva
You're saying you wish to use permno and financial year as merge keys, but you are using only permno in the SQL...?
Obviously this will create a M-M join.
how do i modify my program?
proc sql;
create table adv_vwretd as
select a.* , b.*
from sorted_main_vwretd200 a left join sortcompustat b
on a.permno=b.lpermno and a.fyear=b.fyear
order by permno, fyear;
run;
This is the log
197 proc sql;
198 create table vwretd.adv_vwretd3 as
199 select a.* , b.*
200 from vwretd.sorted_vwretd_pexpand a left join vwretd.sortcompustat b
201 on a.permno=b.lpermno and a.fyear=b.fyear
202 order by permno, fyear;
WARNING: Column named fyear is duplicated in a select expression (or a view). Explicit
references to it will be to the first one.
WARNING: Variable FYEAR already exists on file VWRETD.ADV_VWRETD3.
NOTE: Table VWRETD.ADV_VWRETD3 created, with 53539 rows and 45 columns.
Is there anyway i can improve to address the warning?
Hi. From the message of log, it looks like variable FYEAR is appeared in both sorted_vwretd_pexpand and sortcompustat. Thus we need to rename it at SELECT clause to avoid conflicting .
197 proc sql;
198 create table vwretd.adv_vwretd3 as
199 select a.* , b.fyear as _fyear
200 from vwretd.sorted_vwretd_pexpand a left join vwretd.sortcompustat b
201 on a.permno=b.lpermno and a.fyear=b.fyear
202 order by permno, fyear;
Hi,
Try this..
proc sql;
create table adv_vwretd as
select a.* , b.*
from tt.sorted_main_vwretd200 a left join tt.sortcompustat b
on a.permno=b.lpermno and a.fyear=b.fyear
order by a.permno, a.fyear;
quit;
Thanks,
Shiva
Dear Shivas, this is the log
301 proc sql;
302
303 create table vwretd.adv_vwretd5 as
304
305 select a.* , b.*
306
307 from vwretd.sorted_vwretd_pexpand a left join vwretd.sortcompustat b
308
309 on a.permno=b.lpermno and a.fyear=b.fyear
310
311 order by a.permno, a.fyear;
WARNING: Variable FYEAR already exists on file VWRETD.ADV_VWRETD5.
NOTE: Table VWRETD.ADV_VWRETD5 created, with 53539 rows and 45 columns.
312
313 quit;
is the problem considered solved?
Hi mei,
If you got the correct result you can ignore the warnings,if not you need to code like this..
proc sql;
create table adv_vwretd as
select a.* , b.AT,b.CIK,b.CONM,b.CUSIP,b.DATADATE,b.GVKEY,b.LPERMCO,b.LPERMNO,b.REVT,b.XAD
from tt.sorted_main_vwretd200 a left join tt.sortcompustat b
on a.permno=b.lpermno and a.fyear=b.fyear
order by a.permno, a.fyear;
quit;
Thanks,
Shiva
Dear Ksharp,
only variable of fyear of b (vwreted.sortcompustat) would be included in the table, other variables required would not be selected.
Hi. You can type it on your own just like shivas did .
Don't forget use comma as delimiter.
Hi! I checked the two tables. FYEAR PERMNO is a unique key (if you rename LPERMNO to PERMNO).
So - no problem! Use SQL or ordinary MERGE!
My check program:
libname mylib '.......\SAS_MatStat';
proc sort data=mylib.sortcompustat out= work.sortcompustat;
by fyear lpermno;
run;
data work.err1;
set work.sortcompustat;
by fyear lpermno;
if not(first.lpermno) then output;
run;
proc sort data= mylib.sorted_main_vwretd200 out= work.sorted_main_vwretd200;
by fyear permno;
run;
data work.err2;
set work.sorted_main_vwretd200;
by fyear permno;
if not(first.permno) then output;
run;
ERR1 and ERR2 are both empty - so no problem.
/ Br Anders
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.