BookmarkSubscribeRSS Feed
siare1023
SAS Employee

I'm trying to add in additional rows (of 0 values) to fill in for missing rows. For example, if I only have 5 rows of data (each row represents a month, so Jan-May), I want to add in 7 more rows (Jun-Dec) having value is 0. 

 

Current data:

Screenshot 2024-11-11 131700.png

 

 

 

 

 

 

 

Goal:

Screenshot 2024-11-11 132702.png

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Assuming col1, col2, and col3 can stay the same. What is the best way to automatically find how many rows are missing and then fill in the remaining rows with 0 values? 

data temp;
infile datalines delimiter=',';
	input col1 $ col2 $ col3 $ year month value;
	datalines;
a,b,c,2024,1,15
a,b,c,2024,2,30
a,b,c,2024,3,45
a,b,c,2024,4,61
a,b,c,2024,5,77
;
run;

 

Thank you! 

6 REPLIES 6
Tom
Super User Tom
Super User

Please explain how a human would determine if there are missing months?

Do you have an expected start and stop month?

 

Do you have any grouping variables, such as a subject id?

 

The simplest way is to make an empty datasets with all zeros and the merge it with the original.

 

Let's assume already have a dataset named HAVE that is sorted by the variables that from your picture seem to have named COL1 to COL3.  And that you want to make sure that each combination of COL1, COL2 and COL3 in the dataset has 12 observations.

 

For this simple case of 12 months all on one year just do something like this to create an empty dataset with 12 observations for each grouping and VALUE set to zero in all of them.  Then remerge it with the original HAVE dataset so that the actual values overwrite the zeros.

data empty;
  set have;
  by col1-col3;
  if first.col3 then do;
     year=2024;
     value=0;
     do month=1 to 12;
        output;
     end;
  end;
run;
data want;
  merge empty have;
  by col1-col3 year month;
run;

If the time interval is not all within one year then you can use INTCK, INTNX, YEAR and MONTH functions instead.  Basically loop over the number of intervals in your time period.

do offset=0 to intck('month','01JAN2024'd,'01DEC20204'd);
   date=intnx('month','01JAN2024',offset);
   year=year(date);
   month=month(date);
   output;
run;
Astounding
PROC Star

A simple solution for the simplest form that the problem might take on:

data want;
   set have end=done;
   output;
   if done;
   value = 0;
   if month < 12 then do month = month + 1 to 12;
      output;
   end;
run;

If more is required, let us know why.

Stu_SAS
SAS Employee

One way to do this is by converting your year/month into a SAS date and run it through PROC TIMESERIES to fill in the end of the series.

 

data temp2;
    set temp end=eof;
    month2 = mdy(month, 1, year);

    if(eof) then call symputx('end', put(mdy(12, 1, year), date9.));

    format month2 monyy.;
run;

proc timeseries data=temp2 out=temp3;
    by col1 col2 col3;
    id month2 interval=month end="&end"d;
    var value / setmissing=0;
run;

data want;
    set temp3;

    year  = year(month2);
    month = month(month2);

    drop month2;
run;

Stu_SAS_0-1731360979465.png

 

Ksharp
Super User
data temp;
infile datalines delimiter=',';
	input col1 $ col2 $ col3 $ year month value;
	datalines;
a,b,c,2024,1,15
a,b,c,2024,2,30
a,b,c,2024,3,45
a,b,c,2024,4,61
a,b,c,2024,5,77
;
run;
data month;
 do month=1 to 12;
  output;
 end;
run;
proc sql;
create table want as
select a.*,coalesce(b.value,0) as value
 from 
(select * from (select distinct col1,col2,col3,year from temp),(select distinct month from month)) as a
natural left join 
temp as b;
quit;
quickbluefish
Quartz | Level 8
data temp;
infile datalines delimiter=',';
	input col1 $ col2 $ col3 $ year month value;
	datalines;
a,b,c,2024,1,15
a,b,c,2024,2,30
a,b,c,2024,3,45
a,b,c,2024,4,61
a,b,c,2024,5,77
;
run;

proc sort data=want; by year; run;

data complete;
set temp;
by year;
array v {12} _temporary_;
if first.year then call missing(of v[*]);
v[month]=value;
if last.year then do;
  do month=1 to 12;
    col1='a'; col2='b'; col3='c';
    value=coalesce(v[month], 0);
    output;
  end;
end;
run;
s_lassen
Meteorite | Level 14

Here is code that fills out all missing months in the years in the data (data must be sorted as in the BY statement):

data want;
  set temp;
  by col1 col2 col3 year;
  _prev_month=lag(month);
  _value=value;
  value=0;
  if first.year then do month=1 to month-1;
    output;
    end;
  else do month=_prev_month+1 to month-1;
    output;
    end;
  value=_value;
  output;
  if last.year;
  value=0;
  do month=month+1 to 12;
    output;
    end;
  drop _:;
run;

This is maybe not the most elegant solution, but it may execute faster than the other solutions suggested.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 492 views
  • 0 likes
  • 7 in conversation