DATA Step, Macro, Functions and more

Merging data by ID and the nearest quarter

Accepted Solution Solved
Reply
Senior User
Posts: 1
Accepted Solution

Merging data by ID and the nearest quarter

Hi everyone. 

I have two dataset that need to be merged: The first dataset contains around 39k observations, which are acquistion activities taken by U.S firms. The dataset contains the firms' unique ID (called PERMNO), and the date on which the acquisitions were completed. I want to merge this dataset with the second dataset, which contains PERMNO and each firm's quarterly accounting data (eg total asset) over a period of time.

 

I format the date of both dataset as yyq. (ie. year and quarter), then sort them by PERMNO and date. I then merge by PERMNO and date using proc sql, the code is as follows:

 

proc sql;
create table want as
select dataset1.*,dataset2.*
from dataset1 left join dataset2
on dataset1.PERMNO=dataset2.LPERMNO and dataset1.date=dataset2.datadate;
quit;

 

 

However the merged table has a significant amount of observations missing, which should not be the case according to the two original datasets.  Have I missed something?

 

Thanks 


Accepted Solutions
Solution
‎07-12-2017 01:57 AM
PROC Star
Posts: 1,561

Re: Merging data by ID and the nearest quarter

[ Edited ]

 

It's impossible to reply without sample data, but one thing stands out:

 

 > I format the date of both dataset as yyq. (ie. year and quarter), then sort them

 

formatting has no effect on the sort or the merge, it only affects how the value is displayed.

 

If you want to merge by quarter you need to make this known. Either create a new variable, or change the SQL:

 

on  dataset1.PERMNO=dataset2.LPERMNO

and put(dataset1.DATE,yyq.)=put(dataset2.DATADATE,yyq.);

View solution in original post


All Replies
Solution
‎07-12-2017 01:57 AM
PROC Star
Posts: 1,561

Re: Merging data by ID and the nearest quarter

[ Edited ]

 

It's impossible to reply without sample data, but one thing stands out:

 

 > I format the date of both dataset as yyq. (ie. year and quarter), then sort them

 

formatting has no effect on the sort or the merge, it only affects how the value is displayed.

 

If you want to merge by quarter you need to make this known. Either create a new variable, or change the SQL:

 

on  dataset1.PERMNO=dataset2.LPERMNO

and put(dataset1.DATE,yyq.)=put(dataset2.DATADATE,yyq.);

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 84 views
  • 1 like
  • 2 in conversation