I'm trying to add in additional rows (of 0 values) to fill in for missing rows. For example, if I only have 5 rows of data (each row represents a month, so Jan-May), I want to add in 7 more rows (Jun-Dec) having value is 0.
Current data:
Goal:
Assuming col1, col2, and col3 can stay the same. What is the best way to automatically find how many rows are missing and then fill in the remaining rows with 0 values?
data temp;
infile datalines delimiter=',';
input col1 $ col2 $ col3 $ year month value;
datalines;
a,b,c,2024,1,15
a,b,c,2024,2,30
a,b,c,2024,3,45
a,b,c,2024,4,61
a,b,c,2024,5,77
;
run;
Thank you!
Please explain how a human would determine if there are missing months?
Do you have an expected start and stop month?
Do you have any grouping variables, such as a subject id?
The simplest way is to make an empty datasets with all zeros and the merge it with the original.
Let's assume already have a dataset named HAVE that is sorted by the variables that from your picture seem to have named COL1 to COL3. And that you want to make sure that each combination of COL1, COL2 and COL3 in the dataset has 12 observations.
For this simple case of 12 months all on one year just do something like this to create an empty dataset with 12 observations for each grouping and VALUE set to zero in all of them. Then remerge it with the original HAVE dataset so that the actual values overwrite the zeros.
data empty;
set have;
by col1-col3;
if first.col3 then do;
year=2024;
value=0;
do month=1 to 12;
output;
end;
end;
run;
data want;
merge empty have;
by col1-col3 year month;
run;
If the time interval is not all within one year then you can use INTCK, INTNX, YEAR and MONTH functions instead. Basically loop over the number of intervals in your time period.
do offset=0 to intck('month','01JAN2024'd,'01DEC20204'd);
date=intnx('month','01JAN2024',offset);
year=year(date);
month=month(date);
output;
run;
A simple solution for the simplest form that the problem might take on:
data want;
set have end=done;
output;
if done;
value = 0;
if month < 12 then do month = month + 1 to 12;
output;
end;
run;
If more is required, let us know why.
One way to do this is by converting your year/month into a SAS date and run it through PROC TIMESERIES to fill in the end of the series.
data temp2;
set temp end=eof;
month2 = mdy(month, 1, year);
if(eof) then call symputx('end', put(mdy(12, 1, year), date9.));
format month2 monyy.;
run;
proc timeseries data=temp2 out=temp3;
by col1 col2 col3;
id month2 interval=month end="&end"d;
var value / setmissing=0;
run;
data want;
set temp3;
year = year(month2);
month = month(month2);
drop month2;
run;
data temp;
infile datalines delimiter=',';
input col1 $ col2 $ col3 $ year month value;
datalines;
a,b,c,2024,1,15
a,b,c,2024,2,30
a,b,c,2024,3,45
a,b,c,2024,4,61
a,b,c,2024,5,77
;
run;
data month;
do month=1 to 12;
output;
end;
run;
proc sql;
create table want as
select a.*,coalesce(b.value,0) as value
from
(select * from (select distinct col1,col2,col3,year from temp),(select distinct month from month)) as a
natural left join
temp as b;
quit;
data temp;
infile datalines delimiter=',';
input col1 $ col2 $ col3 $ year month value;
datalines;
a,b,c,2024,1,15
a,b,c,2024,2,30
a,b,c,2024,3,45
a,b,c,2024,4,61
a,b,c,2024,5,77
;
run;
proc sort data=want; by year; run;
data complete;
set temp;
by year;
array v {12} _temporary_;
if first.year then call missing(of v[*]);
v[month]=value;
if last.year then do;
do month=1 to 12;
col1='a'; col2='b'; col3='c';
value=coalesce(v[month], 0);
output;
end;
end;
run;
Here is code that fills out all missing months in the years in the data (data must be sorted as in the BY statement):
data want;
set temp;
by col1 col2 col3 year;
_prev_month=lag(month);
_value=value;
value=0;
if first.year then do month=1 to month-1;
output;
end;
else do month=_prev_month+1 to month-1;
output;
end;
value=_value;
output;
if last.year;
value=0;
do month=month+1 to 12;
output;
end;
drop _:;
run;
This is maybe not the most elegant solution, but it may execute faster than the other solutions suggested.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.