Hello all!
I want to convert mixed quarterly and annual frequency data to monthly data in panel data setting.
I have
summary_period2 : represents frequency, (A: annual, Q: quarterly, AQ:quarterly)
crsp_fundno : entity id variable
caldt : calendar date
end_dt : ending calendar date for each entitiy
summary_period2 | crsp_fundno | exp_ratio | caldt | end_dt |
A | 1 | 0.0025 | 19971231 | 20000731 |
A | 1 | 0.0019 | 19981231 | 20000731 |
Q | 1 | 0.0019 | 19990331 | 20000731 |
Q | 1 | 0.0019 | 19990630 | 20000731 |
Q | 1 | 0.0042 | 19990930 | 20000731 |
AQ | 1 | 0.0042 | 19991231 | 20000731 |
Q | 1 | 0.0042 | 20000331 | 20000731 |
Q | 1 | 0.0042 | 20000630 | 20000731 |
A | 2 | 0 | 19971231 | 20000831 |
A | 2 | 0 | 19981231 | 20000831 |
Q | 2 | 0 | 19990331 | 20000831 |
Q | 2 | 0 | 19990630 | 20000831 |
Q | 2 | 0 | 19990930 | 20000831 |
AQ | 2 | 0 | 19991231 | 20000831 |
Q | 2 | 0 | 20000331 | 20000831 |
Q | 2 | 0 | 20000630 | 20000831 |
data have;
input summary_period2 $ crsp_fundno exp_ratio caldt end_dt;
datalines;
A 1 0.0025 19971231 20000731
A 1 0.0019 19981231 20000731
Q 1 0.0019 19990331 20000731
Q 1 0.0019 19990630 20000731
Q 1 0.0042 19990930 20000731
AQ 1 0.0042 19991231 20000731
Q 1 0.0042 20000331 20000731
Q 1 0.0042 20000630 20000731
A 2 0 19971231 20000831
A 2 0 19981231 20000831
Q 2 0 19990331 20000831
Q 2 0 19990630 20000831
Q 2 0 19990930 20000831
AQ 2 0 19991231 20000831
Q 2 0 20000331 20000831
Q 2 0 20000630 20000831
;
run;
and want
crsp_fundno | exp_ratio | caldt |
1 | 0.0025 | 19970131 |
1 | 0.0025 | 19970228 |
1 | 0.0025 | 19970331 |
1 | 0.0025 | 19970430 |
1 | 0.0025 | 19970531 |
1 | 0.0025 | 19970630 |
1 | 0.0025 | 19970731 |
1 | 0.0025 | 19970830 |
1 | 0.0025 | 19970931 |
1 | 0.0025 | 19971030 |
1 | 0.0025 | 19971131 |
1 | 0.0025 | 19971231 |
1 | 0.0019 | 19980131 |
1 | 0.0019 | 19980228 |
1 | 0.0019 | 19980331 |
1 | 0.0019 | 19980430 |
1 | 0.0019 | 19980531 |
1 | 0.0019 | 19980630 |
1 | 0.0019 | 19980731 |
1 | 0.0019 | 19980831 |
1 | 0.0019 | 19980930 |
1 | 0.0019 | 19981031 |
1 | 0.0019 | 19981130 |
1 | 0.0019 | 19981231 |
1 | 0.0019 | 19990131 |
1 | 0.0019 | 19990228 |
1 | 0.0019 | 19990331 |
1 | 0.0019 | 19990430 |
1 | 0.0019 | 19990531 |
1 | 0.0019 | 19990630 |
1 | 0.0042 | 19990731 |
1 | 0.0042 | 19990831 |
1 | 0.0042 | 19990930 |
1 | 0.0042 | 19991031 |
1 | 0.0042 | 19991130 |
1 | 0.0042 | 19991231 |
1 | 0.0042 | 20000131 |
1 | 0.0042 | 20000229 |
1 | 0.0042 | 20000331 |
1 | 0.0042 | 20000430 |
1 | 0.0042 | 20000531 |
1 | 0.0042 | 20000630 |
1 | 0.0042 | 20000731 |
2 | 0 | 19970131 |
2 | 0 | 19970228 |
2 | 0 | 19970331 |
2 | 0 | 19970430 |
2 | 0 | 19970531 |
2 | 0 | 19970630 |
2 | 0 | 19970731 |
2 | 0 | 19970831 |
2 | 0 | 19970930 |
2 | 0 | 19971031 |
2 | 0 | 19971130 |
2 | 0 | 19971231 |
2 | 0 | 19980131 |
2 | 0 | 19980228 |
2 | 0 | 19980331 |
2 | 0 | 19980430 |
2 | 0 | 19980531 |
2 | 0 | 19980630 |
2 | 0 | 19980731 |
2 | 0 | 19980831 |
2 | 0 | 19980930 |
2 | 0 | 19981031 |
2 | 0 | 19981130 |
2 | 0 | 19981231 |
2 | 0 | 19990331 |
2 | 0 | 19990630 |
2 | 0 | 19990930 |
2 | 0 | 19991231 |
2 | 0 | 20000331 |
2 | 0 | 20000630 |
2 | 0 | 20000731 |
2 | 0 | 20000831 |
In this situation how do I get the dataset I wanted?
If you have SAS ETS, try PROC TIMESERIES.
If you don't, a manual data step with a start/end date and a loop works:
data want;
set have;
do date= start_date to end_date;
output;
end;
run;
@jkim197 wrote:
Hello all!
I want to convert mixed quarterly and annual frequency data to monthly data in panel data setting.
I have
summary_period2 : represents frequency, (A: annual, Q: quarterly, AQ:quarterly)
crsp_fundno : entity id variable
caldt : calendar date
end_dt : ending calendar date for each entitiy
summary_period2 crsp_fundno exp_ratio caldt end_dt A 1 0.0025 19971231 20000731 A 1 0.0019 19981231 20000731 Q 1 0.0019 19990331 20000731 Q 1 0.0019 19990630 20000731 Q 1 0.0042 19990930 20000731 AQ 1 0.0042 19991231 20000731 Q 1 0.0042 20000331 20000731 Q 1 0.0042 20000630 20000731 A 2 0 19971231 20000831 A 2 0 19981231 20000831 Q 2 0 19990331 20000831 Q 2 0 19990630 20000831 Q 2 0 19990930 20000831 AQ 2 0 19991231 20000831 Q 2 0 20000331 20000831 Q 2 0 20000630 20000831
data have; input summary_period2 $ crsp_fundno exp_ratio caldt end_dt; datalines; A 1 0.0025 19971231 20000731 A 1 0.0019 19981231 20000731 Q 1 0.0019 19990331 20000731 Q 1 0.0019 19990630 20000731 Q 1 0.0042 19990930 20000731 AQ 1 0.0042 19991231 20000731 Q 1 0.0042 20000331 20000731 Q 1 0.0042 20000630 20000731 A 2 0 19971231 20000831 A 2 0 19981231 20000831 Q 2 0 19990331 20000831 Q 2 0 19990630 20000831 Q 2 0 19990930 20000831 AQ 2 0 19991231 20000831 Q 2 0 20000331 20000831 Q 2 0 20000630 20000831 ; run;
and want
crsp_fundno exp_ratio caldt 1 0.0025 19970131 1 0.0025 19970228 1 0.0025 19970331 1 0.0025 19970430 1 0.0025 19970531 1 0.0025 19970630 1 0.0025 19970731 1 0.0025 19970830 1 0.0025 19970931 1 0.0025 19971030 1 0.0025 19971131 1 0.0025 19971231 1 0.0019 19980131 1 0.0019 19980228 1 0.0019 19980331 1 0.0019 19980430 1 0.0019 19980531 1 0.0019 19980630 1 0.0019 19980731 1 0.0019 19980831 1 0.0019 19980930 1 0.0019 19981031 1 0.0019 19981130 1 0.0019 19981231 1 0.0019 19990131 1 0.0019 19990228 1 0.0019 19990331 1 0.0019 19990430 1 0.0019 19990531 1 0.0019 19990630 1 0.0042 19990731 1 0.0042 19990831 1 0.0042 19990930 1 0.0042 19991031 1 0.0042 19991130 1 0.0042 19991231 1 0.0042 20000131 1 0.0042 20000229 1 0.0042 20000331 1 0.0042 20000430 1 0.0042 20000531 1 0.0042 20000630 1 0.0042 20000731 2 0 19970131 2 0 19970228 2 0 19970331 2 0 19970430 2 0 19970531 2 0 19970630 2 0 19970731 2 0 19970831 2 0 19970930 2 0 19971031 2 0 19971130 2 0 19971231 2 0 19980131 2 0 19980228 2 0 19980331 2 0 19980430 2 0 19980531 2 0 19980630 2 0 19980731 2 0 19980831 2 0 19980930 2 0 19981031 2 0 19981130 2 0 19981231 2 0 19990331 2 0 19990630 2 0 19990930 2 0 19991231 2 0 20000331 2 0 20000630 2 0 20000731 2 0 20000831
In this situation how do I get the dataset I wanted?
@Reeza, the line
do date= start_date to end_date;
will loop a day by day.
If I understand correctly, the loop should be by month, the the code should be:
(the code was droped from my previos post)
data want;
set have(rename=(caldt = start_dt));
caldt = start_dt;
drop start_dt end_dt;
do while (caldt le end_dt);
output;
caldt = intnx('month', caldt, 1 ,'end');
end;
run;
data have;
input summary_period2 $ crsp_fundno exp_ratio caldt : yymmdd10. end_dt : yymmdd10.;
format caldt yymmdd10. end_dt yymmdd10.;
datalines;
A 1 0.0025 19971231 20000731
A 1 0.0019 19981231 20000731
Q 1 0.0019 19990331 20000731
Q 1 0.0019 19990630 20000731
Q 1 0.0042 19990930 20000731
AQ 1 0.0042 19991231 20000731
Q 1 0.0042 20000331 20000731
Q 1 0.0042 20000630 20000731
A 2 0 19971231 20000831
A 2 0 19981231 20000831
Q 2 0 19990331 20000831
Q 2 0 19990630 20000831
Q 2 0 19990930 20000831
AQ 2 0 19991231 20000831
Q 2 0 20000331 20000831
Q 2 0 20000630 20000831
;
run;
data want(drop=caldt _n rename=(n=caldt));
set have;
by crsp_fundno;
retain _n;
if first.crsp_fundno then do;
call missing(n);
n=intnx('month',intnx('year',caldt,0,'b')-1,0,'e');
end;
else n=_n;
do until(n=caldt);
n=intnx('month',n,1,'e');
output;
end;
if last.crsp_fundno then do;n=end_dt;output;end;
_n=n;
format n _n yymmdd10.;
run;
@jkim197I need some in help in understanding your expected results for crsp_fundno =2
2 | 0 | 19991231 |
2 | 0 | 20000331 |
2 | 0 | 20000630 |
Seems like i am missing something as I am not spotting the month sequence above
@jkim197wrote:
if I dont want to interpolate the exp_ratio so that
I stop at last caldt, what should I do?
1 0.0042 20000131
1 0.0042 20000229
1 0.0042 20000331
1 0.0042 20000430
1 0.0042 20000531
1 0.0042 20000630
* stop at here not 20000731
For that all you need to is comment out --->if last.crsp_fundno then do;n=end_dt;output;end;
data have;
input summary_period2 $ crsp_fundno exp_ratio caldt : yymmdd10. end_dt : yymmdd10.;
format caldt yymmdd10. end_dt yymmdd10.;
datalines;
A 1 0.0025 19971231 20000731
A 1 0.0019 19981231 20000731
Q 1 0.0019 19990331 20000731
Q 1 0.0019 19990630 20000731
Q 1 0.0042 19990930 20000731
AQ 1 0.0042 19991231 20000731
Q 1 0.0042 20000331 20000731
Q 1 0.0042 20000630 20000731
A 2 0 19971231 20000831
A 2 0 19981231 20000831
Q 2 0 19990331 20000831
Q 2 0 19990630 20000831
Q 2 0 19990930 20000831
AQ 2 0 19991231 20000831
Q 2 0 20000331 20000831
Q 2 0 20000630 20000831
;
run;
data want(drop=caldt _n rename=(n=caldt));
set have;
by crsp_fundno;
retain _n;
if first.crsp_fundno then do;
call missing(_n);
n=intnx('month',intnx('year',caldt,0,'b')-1,0,'e');
end;
else n=_n;
do until(n=caldt);
n=intnx('month',n,1,'e');
output;
end;
/*if last.crsp_fundno then do;n=end_dt;output;end;*/
_n=n;
format n _n yymmdd10.;
run;
I'll look forward to your feedback and see if you require any further changes/edit. Very interesting question indeed! Thank you!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.