I have a data set as below, for some products, during some months, the request quantities are missing. I know they are 0.
For a given time periods (e.g. from Jun-2011 to Oct-2011), how to add those missing observation by programming?
ProductArea Product_Level_1 Request_Period_DT Request_Quantity
AAA bbb nov-2010 100
AAA bbb mar-2011 80
AAA bbb apr-2011 100
AAA bbb may-2011 90
AAA bbb sep-2011 200
AAA ccc jun-2011 100
AAA ccc jul-2011 50
AAA ccc aug-2011 80
AAA ccc sep-2011 90
BBB ddd jul-2011 100
BBB eee mar-2011 80
BBB eee apr-2011 100
BBB eee may-2011 90
BR//alex
There are a number of ways to do it, some of which (like using proc expand) require products beyond base sas. One way, not the most efficient but easy to follow, is to use a datastep merge. E.g.:
data have;
input (ProductArea Product_Level_1) ($)
Request_Period_DT anydtdte. Request_Quantity;
format Request_Period_DT monyy7.;
cards;
AAA bbb nov-2010 100
AAA bbb mar-2011 80
AAA bbb apr-2011 100
AAA bbb may-2011 90
AAA bbb sep-2011 200
AAA ccc jun-2011 100
AAA ccc jul-2011 50
AAA ccc aug-2011 80
AAA ccc sep-2011 90
BBB ddd jul-2011 100
BBB eee mar-2011 80
BBB eee apr-2011 100
BBB eee may-2011 90
;
/create a file with one record for each ProductArea Product_Level_1 combination*/
proc sort data=have out=missing nodupkey;
by ProductArea Product_Level_1;
run;
/*create macro variables for the 2 dates of concern*/
%let first=01jun2011;
%let last=01oct2011;
/*expand the file missing to include a record for every month in the range*/
data missing;
set missing;
by ProductArea Product_Level_1;
Request_Quantity=0;
do Request_Period_DT="&first."d to "&last."d;
if day(Request_Period_DT) eq 1 then output;
end;
run;
/*merge have and the expanded missing*/
data want;
merge missing (in=ina) have (in=inb);
by ProductArea Product_Level_1 Request_Period_DT;
if inb or (ina and not inb);
run;
There are a number of ways to do it, some of which (like using proc expand) require products beyond base sas. One way, not the most efficient but easy to follow, is to use a datastep merge. E.g.:
data have;
input (ProductArea Product_Level_1) ($)
Request_Period_DT anydtdte. Request_Quantity;
format Request_Period_DT monyy7.;
cards;
AAA bbb nov-2010 100
AAA bbb mar-2011 80
AAA bbb apr-2011 100
AAA bbb may-2011 90
AAA bbb sep-2011 200
AAA ccc jun-2011 100
AAA ccc jul-2011 50
AAA ccc aug-2011 80
AAA ccc sep-2011 90
BBB ddd jul-2011 100
BBB eee mar-2011 80
BBB eee apr-2011 100
BBB eee may-2011 90
;
/create a file with one record for each ProductArea Product_Level_1 combination*/
proc sort data=have out=missing nodupkey;
by ProductArea Product_Level_1;
run;
/*create macro variables for the 2 dates of concern*/
%let first=01jun2011;
%let last=01oct2011;
/*expand the file missing to include a record for every month in the range*/
data missing;
set missing;
by ProductArea Product_Level_1;
Request_Quantity=0;
do Request_Period_DT="&first."d to "&last."d;
if day(Request_Period_DT) eq 1 then output;
end;
run;
/*merge have and the expanded missing*/
data want;
merge missing (in=ina) have (in=inb);
by ProductArea Product_Level_1 Request_Period_DT;
if inb or (ina and not inb);
run;
Hi art297, thank you very much for your kindly answer, it is very clear and this is exactly what I am looking for! I'll test it in system.
I am very new in sas programming, is it possible for you to explain a little bit more for others solutions (like using proc expand) in brief?
I'll have to leave that for someone more versed than I with proc expand.
Hello -
Here is an example of how to use PROC TIMESERIES (of SAS/ETS) to create time series data from your original data (including 0 values for dates which are missing in your data).
I was assuming that you would like to see all series to be of similar length - otherwise, just leave out the START and END option.
Thanks,
Udo
data have;
input (ProductArea Product_Level_1) ($)
Request_Period_DT anydtdte. Request_Quantity;
format Request_Period_DT monyy7.;
cards;
AAA bbb nov-2010 100
AAA bbb mar-2011 80
AAA bbb apr-2011 100
AAA bbb may-2011 90
AAA bbb sep-2011 200
AAA ccc jun-2011 100
AAA ccc jul-2011 50
AAA ccc aug-2011 80
AAA ccc sep-2011 90
BBB ddd jul-2011 100
BBB eee mar-2011 80
BBB eee apr-2011 100
BBB eee may-2011 90
;
proc sort data=have;
by ProductArea Product_Level_1 Request_Period_DT;
run;
proc timeseries data=have out=want;
id Request_Period_DT interval=month accumulate=total start="01NOV2010"d end="01SEP2011"d setmissing=0;
var Request_Quantity;
by ProductArea Product_Level_1;
run;
Hi Udo, it works perfect, thanks a lot!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.