I am having trouble figuring out how to create entries for missing dates in my time-series data. I'm trying to use proc expand and I'm able to get entries for dates between the already existing dates shown, but I need to get entries for all quarters between 2003 and 2014. Here is a sample of my problem with SAS code.
data test;
input cust reg qtr year sales;
informat qtr yyq6.;
format qtr yyq6.;
datalines;
1 2 2003Q1 2003 20
1 2 2003Q2 2003 24
1 2 2004Q1 2004 19
2 6 2004Q2 2004 77
2 6 2005Q4 2005 66
;
run;
proc sort data=test;
by cust reg qtr;
run;
proc expand data=test out=new from=qtr to=qtr method=none;
by cust reg;
id qtr;
convert sales=sales/transformout=(setmiss 0);
run;
The resulting dataset looks as follows:
As you can see, it adds entries for the quarters between the already existing quarter entries, but I need entries for all quarters between 2003 and 2014.
Hello -
As an alternative you may want to run PROC TIMESERIES instead of PROC EXPAND.
Thanks,
Udo
data test;
input cust reg qtr year sales;
informat qtr yyq6.;
format qtr yyq6.;
datalines;
1 2 2003Q1 2003 20
1 2 2003Q2 2003 24
1 2 2004Q1 2004 19
2 6 2004Q2 2004 77
2 6 2005Q4 2005 66
;
run;
proc sort data=test;
by cust reg qtr;
run;
proc timeseries data=test out=new;
id qtr interval=qtr start="01JAN2003"d end="31DEC2014"d setmissing=0;
var sales;
by cust reg;
run;
Do you have a 2014Q4 value in your base data? I think expand only goes to the maximum value you provide. Add a record for 2014Q4 and see what happens.
Yes, that did add values up to 2014Q4, but only within the customer that I added a value for that quarter. So it seems that this only adds values for dates between the lowest and highest quarter within each customer. I somehow need to get values for all quarters between 2003 and 2014 for every customer. This dataset contains only two customers, but my actual dataset contains much more, and most customers do not have entries for the lowest (2003Q1) and highest (2014Q4) quarters. Might I need a different method?
Try this way:
data test;
input cust reg qtr year sales;
informat qtr yyq6.;
format qtr yyq6.;
datalines;
1 2 2003Q1 2003 20
1 2 2003Q2 2003 24
1 2 2004Q1 2004 19
2 6 2004Q2 2004 77
2 6 2005Q4 2005 66
;
proc sort data=test;
by cust reg qtr;
run;
/* List the required dates */
data qtr;
do year = 2003 to 2014;
do quarter = 0 to 3;
qtr = mdy(1+3*quarter,1,year);
output;
end;
end;
format qtr yyq6.;
keep year qtr;
run;
proc sql;
create table test2 as
select C.cust, C.reg, Q.qtr, Q.year, T.sales
from qtr as Q cross join
(select unique cust, reg from test) as C left join
test as T on C.cust=T.cust and C.reg=T.reg and Q.qtr=T.qtr
order by cust, reg, year, qtr;
quit;
proc expand data=test2 out=new from=qtr to=qtr method=none extrapolate;
by cust reg;
id qtr;
convert sales=sales/transformout=(setmiss 0);
run;
PG
Hello -
As an alternative you may want to run PROC TIMESERIES instead of PROC EXPAND.
Thanks,
Udo
data test;
input cust reg qtr year sales;
informat qtr yyq6.;
format qtr yyq6.;
datalines;
1 2 2003Q1 2003 20
1 2 2003Q2 2003 24
1 2 2004Q1 2004 19
2 6 2004Q2 2004 77
2 6 2005Q4 2005 66
;
run;
proc sort data=test;
by cust reg qtr;
run;
proc timeseries data=test out=new;
id qtr interval=qtr start="01JAN2003"d end="31DEC2014"d setmissing=0;
var sales;
by cust reg;
run;
It looks like this works as well. I will also try to use this method to apply to my larger dataset and see if I get correct results.
Thanks!
It looks like this did the trick. I will try to apply the same method to my larger dataset and see if i can get the same results.
Thanks!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.