BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mei
Calcite | Level 5 mei
Calcite | Level 5

I have 2 files

sortcompustat-contains financial data (65929 obs)

sortedmain_vwretd200-contains calculated indexes (53539 obs)

I want to merge these 2 files,  using their permno (company code) and financial year.

lpermno of sortcompustat = permno of sortedmain_vwretd200

fyear of both files are the same.

I have used the following proc sql steps, however the output adv_vwretd has 463498 observations having duplicate merging:

proc sql;

create table adv_vwretd as

select a.* , b.*

from sorted_main_vwretd200 a left join  sortcompustat b

on a.permno=b.lpermno

order by permno, fyear;

run;

how do i merge the two files with 1 permno for each financial year would contain 1 observation?

1 ACCEPTED SOLUTION

Accepted Solutions
shivas
Pyrite | Level 9

Hi mei,

If you got the correct result you can ignore the warnings,if not you need to code like this..

proc sql;

create table adv_vwretd as

select a.* , b.AT,b.CIK,b.CONM,b.CUSIP,b.DATADATE,b.GVKEY,b.LPERMCO,b.LPERMNO,b.REVT,b.XAD

from tt.sorted_main_vwretd200 a left join  tt.sortcompustat b

on a.permno=b.lpermno and a.fyear=b.fyear

order by a.permno, a.fyear;

quit;

Thanks,

Shiva

View solution in original post

11 REPLIES 11
LinusH
Tourmaline | Level 20

You're saying you wish to use permno and financial year as merge keys, but you are using only permno in the SQL...?

Obviously this will create a M-M join.

Data never sleeps
mei
Calcite | Level 5 mei
Calcite | Level 5

how do i modify my program?

Ksharp
Super User

proc sql;

create table adv_vwretd as

select a.* , b.*

from sorted_main_vwretd200 a left join  sortcompustat b

on a.permno=b.lpermno and a.fyear=b.fyear

order by permno, fyear;

run;

mei
Calcite | Level 5 mei
Calcite | Level 5

This is the log

197  proc sql;

198  create table vwretd.adv_vwretd3 as

199  select a.* , b.*

200  from vwretd.sorted_vwretd_pexpand a left join  vwretd.sortcompustat b

201  on a.permno=b.lpermno and a.fyear=b.fyear

202  order by permno, fyear;

WARNING: Column named fyear is duplicated in a select expression (or a view). Explicit

         references to it will be to the first one.

WARNING: Variable FYEAR already exists on file VWRETD.ADV_VWRETD3.

NOTE: Table VWRETD.ADV_VWRETD3 created, with 53539 rows and 45 columns.

Is there anyway i can improve to address the warning?

Ksharp
Super User

Hi. From the message of log, it looks like variable FYEAR is appeared in both sorted_vwretd_pexpand and sortcompustat. Thus we need to rename it at SELECT clause to avoid conflicting .

197  proc sql;

198  create table vwretd.adv_vwretd3 as

199  select a.* , b.fyear as _fyear

200  from vwretd.sorted_vwretd_pexpand a left join  vwretd.sortcompustat b

201  on a.permno=b.lpermno and a.fyear=b.fyear

202  order by permno, fyear;

shivas
Pyrite | Level 9

Hi,

Try this..

proc sql;

create table adv_vwretd as

select a.* , b.*

from tt.sorted_main_vwretd200 a left join  tt.sortcompustat b

on a.permno=b.lpermno and a.fyear=b.fyear

order by a.permno, a.fyear;

quit;

Thanks,

Shiva

mei
Calcite | Level 5 mei
Calcite | Level 5

Dear Shivas, this is the log

301  proc sql;

302

303  create table vwretd.adv_vwretd5 as

304

305  select a.* , b.*

306

307  from vwretd.sorted_vwretd_pexpand a left join  vwretd.sortcompustat b

308

309  on a.permno=b.lpermno and a.fyear=b.fyear

310

311  order by a.permno, a.fyear;

WARNING: Variable FYEAR already exists on file VWRETD.ADV_VWRETD5.

NOTE: Table VWRETD.ADV_VWRETD5 created, with 53539 rows and 45 columns.

312

313  quit;

is the problem considered solved?

shivas
Pyrite | Level 9

Hi mei,

If you got the correct result you can ignore the warnings,if not you need to code like this..

proc sql;

create table adv_vwretd as

select a.* , b.AT,b.CIK,b.CONM,b.CUSIP,b.DATADATE,b.GVKEY,b.LPERMCO,b.LPERMNO,b.REVT,b.XAD

from tt.sorted_main_vwretd200 a left join  tt.sortcompustat b

on a.permno=b.lpermno and a.fyear=b.fyear

order by a.permno, a.fyear;

quit;

Thanks,

Shiva

mei
Calcite | Level 5 mei
Calcite | Level 5

Dear Ksharp,

only variable of fyear of b (vwreted.sortcompustat) would be included in the table, other variables required would not be selected.

Ksharp
Super User

Hi. You can type it on your own just like shivas did .

Don't forget use comma as delimiter.

AndersS
Lapis Lazuli | Level 10

Hi! I checked the two tables.  FYEAR PERMNO  is a unique key (if you rename LPERMNO to PERMNO).
So - no problem! Use SQL or ordinary MERGE!

My check program:

libname mylib '.......\SAS_MatStat';

proc sort data=mylib.sortcompustat    out= work.sortcompustat;

   by fyear lpermno;

run;

data work.err1;

   set work.sortcompustat;

   by fyear lpermno;

   if not(first.lpermno) then output;

run;

proc sort data= mylib.sorted_main_vwretd200    out= work.sorted_main_vwretd200;

   by fyear permno;

run;

data work.err2;

   set work.sorted_main_vwretd200;

   by fyear permno;

   if not(first.permno) then output;

run;

ERR1 and ERR2 are both empty - so no problem.
/ Br Anders

Anders Sköllermo (Skollermo in English)
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3950 views
  • 6 likes
  • 5 in conversation