I have a time series dataset for 418 variables. I want for all variables to fill up the missing values with the previous available value. If previous value is not available (such that the first value of the time series is missing) then the missing value will be filled up by the next available value. Because I have such a large number of variables I won't be able to fill them up one by one. I want a single set of code that can process for all variables together.
OK. this is for only numeric variables.
data want(drop=i);
set work.sample;/* modify your dataset name */
array _n{&nvars.} _numeric_;
array _rn{&nvars.} 8 _temporary_;/* variables for keeping data of previous observation */
retain _rn:;
do i=1 to dim(_n);
if missing(_n{i}) then _n{i}=_rn{i};
_rn{i}=_n{i};
end;
run;
How is this code.
data sample;
length a b c $10 d e f 8;
infile datalines dsd missover;
input a b c d e f;
datalines;
aa,bb, ,1, ,3
aa, ,cc,1,2,3
dd,x , , ,6,
, ,bc,9, ,7
, ,cc, , ,
;
run;
/* remember number of variables */
data _null_;
set sashelp.vtable;
where memname='SAMPLE';
call symputx('cvars',num_character);
call symputx('nvars',num_numeric);
run;
data want(drop=i);
set sample;
array _n{&nvars.} _numeric_;
array _c{&cvars.} _character_;
array _rn{&nvars.} 8 _temporary_;/* variables for keeping data of previous observation */
array _rc{&cvars.} $ _temporary_;/* variables for keeping data of previous observation */
retain _rc: _rn:;
do i=1 to dim(_n);
if missing(_n{i}) then _n{i}=_rn{i};
_rn{i}=_n{i};
end;
do i=1 to dim(_c);
if missing(_c{i}) then _c{i}=_rc{i};
_rc{i}=_c{i};
end;
run;
OK.
you need modify 2 statements below.
if your dataset is temp.original then
where libname=upcase('temp') and memname=upcase('original ');
and
set temp.original;
That's all.
/* remember number of variables */
data _null_;
set sashelp.vtable;
where libname=upcase('WORK') and memname=upcase('SAMPLE');/* modify this line to your library and dataset name */
call symputx('cvars',num_character);
call symputx('nvars',num_numeric);
run;
data want(drop=i);
set work.sample;/* modify your dataset name */
array _n{&nvars.} _numeric_;
array _c{&cvars.} _character_;
array _rn{&nvars.} 8 _temporary_;/* variables for keeping data of previous observation */
array _rc{&cvars.} $ _temporary_;/* variables for keeping data of previous observation */
retain _rc: _rn:;
do i=1 to dim(_n);
if missing(_n{i}) then _n{i}=_rn{i};
_rn{i}=_n{i};
end;
do i=1 to dim(_c);
if missing(_c{i}) then _c{i}=_rc{i};
_rc{i}=_c{i};
end;
run;
Because these is no character variables in your dataset, I think.
run below.
it checks character and numeric variables are exist or not.
data want(drop=i);
set work.sample;/* modify your dataset name */
%if &nvars>0 %then %do;
array _n{&nvars.} _numeric_;
array _rn{&nvars.} 8 _temporary_;/* variables for keeping data of previous observation */
retain _rn:;
do i=1 to dim(_n);
if missing(_n{i}) then _n{i}=_rn{i};
_rn{i}=_n{i};
end;
%end;
%if &cvars>0 %then %do;
array _c{&cvars.} _character_;
array _rc{&cvars.} $ _temporary_;/* variables for keeping data of previous observation */
retain _rc:;
do i=1 to dim(_c);
if missing(_c{i}) then _c{i}=_rc{i};
_rc{i}=_c{i};
end;
%end;
run;
Yes you are right, I don't have any character variables. Sorry, should've mentioned that before. The new code has got the same erorrs for the character variable section. Also it shows some new errors:
ERROR: The %IF statement is not valid in open code
and
ERROR: The %END statement is not valid in open code.
Thanks for patiently replying to all my queries.
OK. this is for only numeric variables.
data want(drop=i);
set work.sample;/* modify your dataset name */
array _n{&nvars.} _numeric_;
array _rn{&nvars.} 8 _temporary_;/* variables for keeping data of previous observation */
retain _rn:;
do i=1 to dim(_n);
if missing(_n{i}) then _n{i}=_rn{i};
_rn{i}=_n{i};
end;
run;
Basically, you can do this by sorting in descending order, since SAS can only retain the previous value using the retain statement.
The procedure is to sort in descending order once, fill in the missing values, and then re-sort in ascending order.
In this case, it is better to keep the observation number as the key, using _n_ in the previous data step.
/* remember number of variables */
data _null_;
set sashelp.vtable;
where libname=upcase('WORK') and memname=upcase('SAMPLE');/* modify this line to your library and dataset name */
call symputx('nvars',num_numeric+1);/* +1:for key variable */
run;
data want(drop=i);
set work.sample;/* modify your dataset name */
key=_n_;
array _n{&nvars.} _numeric_;
array _rn{&nvars.} 8 _temporary_;/* variables for keeping data of previous observation */
retain _rn:;
do i=1 to dim(_n);
if missing(_n{i}) then _n{i}=_rn{i};
_rn{i}=_n{i};
end;
run;
data sort;
set want;
run;
proc sort data=sort;
by descending key;
run;
data want(drop=i);
set work.sort;
array _n{&nvars.} _numeric_;
array _rn{&nvars.} 8 _temporary_;/* variables for keeping data of previous observation */
retain _rn:;
do i=1 to dim(_n);
if missing(_n{i}) then _n{i}=_rn{i};
_rn{i}=_n{i};
end;
run;
proc sort data=want out=want(drop=key);
by key;
run;
Thank you so much. This should be the accepted solution. I chose the previous one. This worked perfectly
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.