BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mei
Calcite | Level 5 mei
Calcite | Level 5

I have 2 files

sortcompustat-contains financial data (65929 obs)

sortedmain_vwretd200-contains calculated indexes (53539 obs)

I want to merge these 2 files,  using their permno (company code) and financial year.

lpermno of sortcompustat = permno of sortedmain_vwretd200

fyear of both files are the same.

I have used the following proc sql steps, however the output adv_vwretd has 463498 observations having duplicate merging:

proc sql;

create table adv_vwretd as

select a.* , b.*

from sorted_main_vwretd200 a left join  sortcompustat b

on a.permno=b.lpermno

order by permno, fyear;

run;

how do i merge the two files with 1 permno for each financial year would contain 1 observation?

1 ACCEPTED SOLUTION

Accepted Solutions
shivas
Pyrite | Level 9

Hi mei,

If you got the correct result you can ignore the warnings,if not you need to code like this..

proc sql;

create table adv_vwretd as

select a.* , b.AT,b.CIK,b.CONM,b.CUSIP,b.DATADATE,b.GVKEY,b.LPERMCO,b.LPERMNO,b.REVT,b.XAD

from tt.sorted_main_vwretd200 a left join  tt.sortcompustat b

on a.permno=b.lpermno and a.fyear=b.fyear

order by a.permno, a.fyear;

quit;

Thanks,

Shiva

View solution in original post

11 REPLIES 11
LinusH
Tourmaline | Level 20

You're saying you wish to use permno and financial year as merge keys, but you are using only permno in the SQL...?

Obviously this will create a M-M join.

Data never sleeps
mei
Calcite | Level 5 mei
Calcite | Level 5

how do i modify my program?

Ksharp
Super User

proc sql;

create table adv_vwretd as

select a.* , b.*

from sorted_main_vwretd200 a left join  sortcompustat b

on a.permno=b.lpermno and a.fyear=b.fyear

order by permno, fyear;

run;

mei
Calcite | Level 5 mei
Calcite | Level 5

This is the log

197  proc sql;

198  create table vwretd.adv_vwretd3 as

199  select a.* , b.*

200  from vwretd.sorted_vwretd_pexpand a left join  vwretd.sortcompustat b

201  on a.permno=b.lpermno and a.fyear=b.fyear

202  order by permno, fyear;

WARNING: Column named fyear is duplicated in a select expression (or a view). Explicit

         references to it will be to the first one.

WARNING: Variable FYEAR already exists on file VWRETD.ADV_VWRETD3.

NOTE: Table VWRETD.ADV_VWRETD3 created, with 53539 rows and 45 columns.

Is there anyway i can improve to address the warning?

Ksharp
Super User

Hi. From the message of log, it looks like variable FYEAR is appeared in both sorted_vwretd_pexpand and sortcompustat. Thus we need to rename it at SELECT clause to avoid conflicting .

197  proc sql;

198  create table vwretd.adv_vwretd3 as

199  select a.* , b.fyear as _fyear

200  from vwretd.sorted_vwretd_pexpand a left join  vwretd.sortcompustat b

201  on a.permno=b.lpermno and a.fyear=b.fyear

202  order by permno, fyear;

shivas
Pyrite | Level 9

Hi,

Try this..

proc sql;

create table adv_vwretd as

select a.* , b.*

from tt.sorted_main_vwretd200 a left join  tt.sortcompustat b

on a.permno=b.lpermno and a.fyear=b.fyear

order by a.permno, a.fyear;

quit;

Thanks,

Shiva

mei
Calcite | Level 5 mei
Calcite | Level 5

Dear Shivas, this is the log

301  proc sql;

302

303  create table vwretd.adv_vwretd5 as

304

305  select a.* , b.*

306

307  from vwretd.sorted_vwretd_pexpand a left join  vwretd.sortcompustat b

308

309  on a.permno=b.lpermno and a.fyear=b.fyear

310

311  order by a.permno, a.fyear;

WARNING: Variable FYEAR already exists on file VWRETD.ADV_VWRETD5.

NOTE: Table VWRETD.ADV_VWRETD5 created, with 53539 rows and 45 columns.

312

313  quit;

is the problem considered solved?

shivas
Pyrite | Level 9

Hi mei,

If you got the correct result you can ignore the warnings,if not you need to code like this..

proc sql;

create table adv_vwretd as

select a.* , b.AT,b.CIK,b.CONM,b.CUSIP,b.DATADATE,b.GVKEY,b.LPERMCO,b.LPERMNO,b.REVT,b.XAD

from tt.sorted_main_vwretd200 a left join  tt.sortcompustat b

on a.permno=b.lpermno and a.fyear=b.fyear

order by a.permno, a.fyear;

quit;

Thanks,

Shiva

mei
Calcite | Level 5 mei
Calcite | Level 5

Dear Ksharp,

only variable of fyear of b (vwreted.sortcompustat) would be included in the table, other variables required would not be selected.

Ksharp
Super User

Hi. You can type it on your own just like shivas did .

Don't forget use comma as delimiter.

AndersS
Lapis Lazuli | Level 10

Hi! I checked the two tables.  FYEAR PERMNO  is a unique key (if you rename LPERMNO to PERMNO).
So - no problem! Use SQL or ordinary MERGE!

My check program:

libname mylib '.......\SAS_MatStat';

proc sort data=mylib.sortcompustat    out= work.sortcompustat;

   by fyear lpermno;

run;

data work.err1;

   set work.sortcompustat;

   by fyear lpermno;

   if not(first.lpermno) then output;

run;

proc sort data= mylib.sorted_main_vwretd200    out= work.sorted_main_vwretd200;

   by fyear permno;

run;

data work.err2;

   set work.sorted_main_vwretd200;

   by fyear permno;

   if not(first.permno) then output;

run;

ERR1 and ERR2 are both empty - so no problem.
/ Br Anders

Anders Sköllermo (Skollermo in English)

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3901 views
  • 6 likes
  • 5 in conversation