I have 2 files
sortcompustat-contains financial data (65929 obs)
sortedmain_vwretd200-contains calculated indexes (53539 obs)
I want to merge these 2 files, using their permno (company code) and financial year.
lpermno of sortcompustat = permno of sortedmain_vwretd200
fyear of both files are the same.
I have used the following proc sql steps, however the output adv_vwretd has 463498 observations having duplicate merging:
proc sql;
create table adv_vwretd as
select a.* , b.*
from sorted_main_vwretd200 a left join sortcompustat b
on a.permno=b.lpermno
order by permno, fyear;
run;
how do i merge the two files with 1 permno for each financial year would contain 1 observation?
Hi mei,
If you got the correct result you can ignore the warnings,if not you need to code like this..
proc sql;
create table adv_vwretd as
select a.* , b.AT,b.CIK,b.CONM,b.CUSIP,b.DATADATE,b.GVKEY,b.LPERMCO,b.LPERMNO,b.REVT,b.XAD
from tt.sorted_main_vwretd200 a left join tt.sortcompustat b
on a.permno=b.lpermno and a.fyear=b.fyear
order by a.permno, a.fyear;
quit;
Thanks,
Shiva
You're saying you wish to use permno and financial year as merge keys, but you are using only permno in the SQL...?
Obviously this will create a M-M join.
how do i modify my program?
proc sql;
create table adv_vwretd as
select a.* , b.*
from sorted_main_vwretd200 a left join sortcompustat b
on a.permno=b.lpermno and a.fyear=b.fyear
order by permno, fyear;
run;
This is the log
197 proc sql;
198 create table vwretd.adv_vwretd3 as
199 select a.* , b.*
200 from vwretd.sorted_vwretd_pexpand a left join vwretd.sortcompustat b
201 on a.permno=b.lpermno and a.fyear=b.fyear
202 order by permno, fyear;
WARNING: Column named fyear is duplicated in a select expression (or a view). Explicit
references to it will be to the first one.
WARNING: Variable FYEAR already exists on file VWRETD.ADV_VWRETD3.
NOTE: Table VWRETD.ADV_VWRETD3 created, with 53539 rows and 45 columns.
Is there anyway i can improve to address the warning?
Hi. From the message of log, it looks like variable FYEAR is appeared in both sorted_vwretd_pexpand and sortcompustat. Thus we need to rename it at SELECT clause to avoid conflicting .
197 proc sql;
198 create table vwretd.adv_vwretd3 as
199 select a.* , b.fyear as _fyear
200 from vwretd.sorted_vwretd_pexpand a left join vwretd.sortcompustat b
201 on a.permno=b.lpermno and a.fyear=b.fyear
202 order by permno, fyear;
Hi,
Try this..
proc sql;
create table adv_vwretd as
select a.* , b.*
from tt.sorted_main_vwretd200 a left join tt.sortcompustat b
on a.permno=b.lpermno and a.fyear=b.fyear
order by a.permno, a.fyear;
quit;
Thanks,
Shiva
Dear Shivas, this is the log
301 proc sql;
302
303 create table vwretd.adv_vwretd5 as
304
305 select a.* , b.*
306
307 from vwretd.sorted_vwretd_pexpand a left join vwretd.sortcompustat b
308
309 on a.permno=b.lpermno and a.fyear=b.fyear
310
311 order by a.permno, a.fyear;
WARNING: Variable FYEAR already exists on file VWRETD.ADV_VWRETD5.
NOTE: Table VWRETD.ADV_VWRETD5 created, with 53539 rows and 45 columns.
312
313 quit;
is the problem considered solved?
Hi mei,
If you got the correct result you can ignore the warnings,if not you need to code like this..
proc sql;
create table adv_vwretd as
select a.* , b.AT,b.CIK,b.CONM,b.CUSIP,b.DATADATE,b.GVKEY,b.LPERMCO,b.LPERMNO,b.REVT,b.XAD
from tt.sorted_main_vwretd200 a left join tt.sortcompustat b
on a.permno=b.lpermno and a.fyear=b.fyear
order by a.permno, a.fyear;
quit;
Thanks,
Shiva
Dear Ksharp,
only variable of fyear of b (vwreted.sortcompustat) would be included in the table, other variables required would not be selected.
Hi. You can type it on your own just like shivas did .
Don't forget use comma as delimiter.
Hi! I checked the two tables. FYEAR PERMNO is a unique key (if you rename LPERMNO to PERMNO).
So - no problem! Use SQL or ordinary MERGE!
My check program:
libname mylib '.......\SAS_MatStat';
proc sort data=mylib.sortcompustat out= work.sortcompustat;
by fyear lpermno;
run;
data work.err1;
set work.sortcompustat;
by fyear lpermno;
if not(first.lpermno) then output;
run;
proc sort data= mylib.sorted_main_vwretd200 out= work.sorted_main_vwretd200;
by fyear permno;
run;
data work.err2;
set work.sorted_main_vwretd200;
by fyear permno;
if not(first.permno) then output;
run;
ERR1 and ERR2 are both empty - so no problem.
/ Br Anders
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.