Hello all!
I want to convert mixed quarterly and annual frequency data to monthly data in panel data setting.
I have
summary_period2 : represents frequency, (A: annual, Q: quarterly, AQ:quarterly)
crsp_fundno : entity id variable
caldt : calendar date
end_dt : ending calendar date for each entitiy
summary_period2 | crsp_fundno | exp_ratio | caldt | end_dt |
A | 1 | 0.0025 | 19971231 | 20000731 |
A | 1 | 0.0019 | 19981231 | 20000731 |
Q | 1 | 0.0019 | 19990331 | 20000731 |
Q | 1 | 0.0019 | 19990630 | 20000731 |
Q | 1 | 0.0042 | 19990930 | 20000731 |
AQ | 1 | 0.0042 | 19991231 | 20000731 |
Q | 1 | 0.0042 | 20000331 | 20000731 |
Q | 1 | 0.0042 | 20000630 | 20000731 |
A | 2 | 0 | 19971231 | 20000831 |
A | 2 | 0 | 19981231 | 20000831 |
Q | 2 | 0 | 19990331 | 20000831 |
Q | 2 | 0 | 19990630 | 20000831 |
Q | 2 | 0 | 19990930 | 20000831 |
AQ | 2 | 0 | 19991231 | 20000831 |
Q | 2 | 0 | 20000331 | 20000831 |
Q | 2 | 0 | 20000630 | 20000831 |
data have;
input summary_period2 $ crsp_fundno exp_ratio caldt end_dt;
datalines;
A 1 0.0025 19971231 20000731
A 1 0.0019 19981231 20000731
Q 1 0.0019 19990331 20000731
Q 1 0.0019 19990630 20000731
Q 1 0.0042 19990930 20000731
AQ 1 0.0042 19991231 20000731
Q 1 0.0042 20000331 20000731
Q 1 0.0042 20000630 20000731
A 2 0 19971231 20000831
A 2 0 19981231 20000831
Q 2 0 19990331 20000831
Q 2 0 19990630 20000831
Q 2 0 19990930 20000831
AQ 2 0 19991231 20000831
Q 2 0 20000331 20000831
Q 2 0 20000630 20000831
;
run;
and want
crsp_fundno | exp_ratio | caldt |
1 | 0.0025 | 19970131 |
1 | 0.0025 | 19970228 |
1 | 0.0025 | 19970331 |
1 | 0.0025 | 19970430 |
1 | 0.0025 | 19970531 |
1 | 0.0025 | 19970630 |
1 | 0.0025 | 19970731 |
1 | 0.0025 | 19970830 |
1 | 0.0025 | 19970931 |
1 | 0.0025 | 19971030 |
1 | 0.0025 | 19971131 |
1 | 0.0025 | 19971231 |
1 | 0.0019 | 19980131 |
1 | 0.0019 | 19980228 |
1 | 0.0019 | 19980331 |
1 | 0.0019 | 19980430 |
1 | 0.0019 | 19980531 |
1 | 0.0019 | 19980630 |
1 | 0.0019 | 19980731 |
1 | 0.0019 | 19980831 |
1 | 0.0019 | 19980930 |
1 | 0.0019 | 19981031 |
1 | 0.0019 | 19981130 |
1 | 0.0019 | 19981231 |
1 | 0.0019 | 19990131 |
1 | 0.0019 | 19990228 |
1 | 0.0019 | 19990331 |
1 | 0.0019 | 19990430 |
1 | 0.0019 | 19990531 |
1 | 0.0019 | 19990630 |
1 | 0.0042 | 19990731 |
1 | 0.0042 | 19990831 |
1 | 0.0042 | 19990930 |
1 | 0.0042 | 19991031 |
1 | 0.0042 | 19991130 |
1 | 0.0042 | 19991231 |
1 | 0.0042 | 20000131 |
1 | 0.0042 | 20000229 |
1 | 0.0042 | 20000331 |
1 | 0.0042 | 20000430 |
1 | 0.0042 | 20000531 |
1 | 0.0042 | 20000630 |
1 | 0.0042 | 20000731 |
2 | 0 | 19970131 |
2 | 0 | 19970228 |
2 | 0 | 19970331 |
2 | 0 | 19970430 |
2 | 0 | 19970531 |
2 | 0 | 19970630 |
2 | 0 | 19970731 |
2 | 0 | 19970831 |
2 | 0 | 19970930 |
2 | 0 | 19971031 |
2 | 0 | 19971130 |
2 | 0 | 19971231 |
2 | 0 | 19980131 |
2 | 0 | 19980228 |
2 | 0 | 19980331 |
2 | 0 | 19980430 |
2 | 0 | 19980531 |
2 | 0 | 19980630 |
2 | 0 | 19980731 |
2 | 0 | 19980831 |
2 | 0 | 19980930 |
2 | 0 | 19981031 |
2 | 0 | 19981130 |
2 | 0 | 19981231 |
2 | 0 | 19990331 |
2 | 0 | 19990630 |
2 | 0 | 19990930 |
2 | 0 | 19991231 |
2 | 0 | 20000331 |
2 | 0 | 20000630 |
2 | 0 | 20000731 |
2 | 0 | 20000831 |
In this situation how do I get the dataset I wanted?
If you have SAS ETS, try PROC TIMESERIES.
If you don't, a manual data step with a start/end date and a loop works:
data want;
set have;
do date= start_date to end_date;
output;
end;
run;
@jkim197 wrote:
Hello all!
I want to convert mixed quarterly and annual frequency data to monthly data in panel data setting.
I have
summary_period2 : represents frequency, (A: annual, Q: quarterly, AQ:quarterly)
crsp_fundno : entity id variable
caldt : calendar date
end_dt : ending calendar date for each entitiy
summary_period2 crsp_fundno exp_ratio caldt end_dt A 1 0.0025 19971231 20000731 A 1 0.0019 19981231 20000731 Q 1 0.0019 19990331 20000731 Q 1 0.0019 19990630 20000731 Q 1 0.0042 19990930 20000731 AQ 1 0.0042 19991231 20000731 Q 1 0.0042 20000331 20000731 Q 1 0.0042 20000630 20000731 A 2 0 19971231 20000831 A 2 0 19981231 20000831 Q 2 0 19990331 20000831 Q 2 0 19990630 20000831 Q 2 0 19990930 20000831 AQ 2 0 19991231 20000831 Q 2 0 20000331 20000831 Q 2 0 20000630 20000831
data have; input summary_period2 $ crsp_fundno exp_ratio caldt end_dt; datalines; A 1 0.0025 19971231 20000731 A 1 0.0019 19981231 20000731 Q 1 0.0019 19990331 20000731 Q 1 0.0019 19990630 20000731 Q 1 0.0042 19990930 20000731 AQ 1 0.0042 19991231 20000731 Q 1 0.0042 20000331 20000731 Q 1 0.0042 20000630 20000731 A 2 0 19971231 20000831 A 2 0 19981231 20000831 Q 2 0 19990331 20000831 Q 2 0 19990630 20000831 Q 2 0 19990930 20000831 AQ 2 0 19991231 20000831 Q 2 0 20000331 20000831 Q 2 0 20000630 20000831 ; run;
and want
crsp_fundno exp_ratio caldt 1 0.0025 19970131 1 0.0025 19970228 1 0.0025 19970331 1 0.0025 19970430 1 0.0025 19970531 1 0.0025 19970630 1 0.0025 19970731 1 0.0025 19970830 1 0.0025 19970931 1 0.0025 19971030 1 0.0025 19971131 1 0.0025 19971231 1 0.0019 19980131 1 0.0019 19980228 1 0.0019 19980331 1 0.0019 19980430 1 0.0019 19980531 1 0.0019 19980630 1 0.0019 19980731 1 0.0019 19980831 1 0.0019 19980930 1 0.0019 19981031 1 0.0019 19981130 1 0.0019 19981231 1 0.0019 19990131 1 0.0019 19990228 1 0.0019 19990331 1 0.0019 19990430 1 0.0019 19990531 1 0.0019 19990630 1 0.0042 19990731 1 0.0042 19990831 1 0.0042 19990930 1 0.0042 19991031 1 0.0042 19991130 1 0.0042 19991231 1 0.0042 20000131 1 0.0042 20000229 1 0.0042 20000331 1 0.0042 20000430 1 0.0042 20000531 1 0.0042 20000630 1 0.0042 20000731 2 0 19970131 2 0 19970228 2 0 19970331 2 0 19970430 2 0 19970531 2 0 19970630 2 0 19970731 2 0 19970831 2 0 19970930 2 0 19971031 2 0 19971130 2 0 19971231 2 0 19980131 2 0 19980228 2 0 19980331 2 0 19980430 2 0 19980531 2 0 19980630 2 0 19980731 2 0 19980831 2 0 19980930 2 0 19981031 2 0 19981130 2 0 19981231 2 0 19990331 2 0 19990630 2 0 19990930 2 0 19991231 2 0 20000331 2 0 20000630 2 0 20000731 2 0 20000831
In this situation how do I get the dataset I wanted?
@Reeza, the line
do date= start_date to end_date;
will loop a day by day.
If I understand correctly, the loop should be by month, the the code should be:
(the code was droped from my previos post)
data want;
set have(rename=(caldt = start_dt));
caldt = start_dt;
drop start_dt end_dt;
do while (caldt le end_dt);
output;
caldt = intnx('month', caldt, 1 ,'end');
end;
run;
data have;
input summary_period2 $ crsp_fundno exp_ratio caldt : yymmdd10. end_dt : yymmdd10.;
format caldt yymmdd10. end_dt yymmdd10.;
datalines;
A 1 0.0025 19971231 20000731
A 1 0.0019 19981231 20000731
Q 1 0.0019 19990331 20000731
Q 1 0.0019 19990630 20000731
Q 1 0.0042 19990930 20000731
AQ 1 0.0042 19991231 20000731
Q 1 0.0042 20000331 20000731
Q 1 0.0042 20000630 20000731
A 2 0 19971231 20000831
A 2 0 19981231 20000831
Q 2 0 19990331 20000831
Q 2 0 19990630 20000831
Q 2 0 19990930 20000831
AQ 2 0 19991231 20000831
Q 2 0 20000331 20000831
Q 2 0 20000630 20000831
;
run;
data want(drop=caldt _n rename=(n=caldt));
set have;
by crsp_fundno;
retain _n;
if first.crsp_fundno then do;
call missing(n);
n=intnx('month',intnx('year',caldt,0,'b')-1,0,'e');
end;
else n=_n;
do until(n=caldt);
n=intnx('month',n,1,'e');
output;
end;
if last.crsp_fundno then do;n=end_dt;output;end;
_n=n;
format n _n yymmdd10.;
run;
@jkim197I need some in help in understanding your expected results for crsp_fundno =2
2 | 0 | 19991231 |
2 | 0 | 20000331 |
2 | 0 | 20000630 |
Seems like i am missing something as I am not spotting the month sequence above
@jkim197wrote:
if I dont want to interpolate the exp_ratio so that
I stop at last caldt, what should I do?
1 0.0042 20000131
1 0.0042 20000229
1 0.0042 20000331
1 0.0042 20000430
1 0.0042 20000531
1 0.0042 20000630
* stop at here not 20000731
For that all you need to is comment out --->if last.crsp_fundno then do;n=end_dt;output;end;
data have;
input summary_period2 $ crsp_fundno exp_ratio caldt : yymmdd10. end_dt : yymmdd10.;
format caldt yymmdd10. end_dt yymmdd10.;
datalines;
A 1 0.0025 19971231 20000731
A 1 0.0019 19981231 20000731
Q 1 0.0019 19990331 20000731
Q 1 0.0019 19990630 20000731
Q 1 0.0042 19990930 20000731
AQ 1 0.0042 19991231 20000731
Q 1 0.0042 20000331 20000731
Q 1 0.0042 20000630 20000731
A 2 0 19971231 20000831
A 2 0 19981231 20000831
Q 2 0 19990331 20000831
Q 2 0 19990630 20000831
Q 2 0 19990930 20000831
AQ 2 0 19991231 20000831
Q 2 0 20000331 20000831
Q 2 0 20000630 20000831
;
run;
data want(drop=caldt _n rename=(n=caldt));
set have;
by crsp_fundno;
retain _n;
if first.crsp_fundno then do;
call missing(_n);
n=intnx('month',intnx('year',caldt,0,'b')-1,0,'e');
end;
else n=_n;
do until(n=caldt);
n=intnx('month',n,1,'e');
output;
end;
/*if last.crsp_fundno then do;n=end_dt;output;end;*/
_n=n;
format n _n yymmdd10.;
run;
I'll look forward to your feedback and see if you require any further changes/edit. Very interesting question indeed! Thank you!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.