SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Fill missing value with the average of surrounding values

Accepted Solution Solved
Reply
Contributor
Posts: 63
Accepted Solution

Fill missing value with the average of surrounding values

[ Edited ]

Hi, I currently have a problem that requires filling the missing value with surrounding value mean. For example:

data have;
input a ;
datalines;
1
2
.
.
.
2
;
run;

Here is what I need:

data want;
input a ;
datalines;
1
2
1.5
1.25
1.5
2
;
run;
As you may notice, the filling method is when there is only one missing, simply take the average of the surrounding values. If there are two continuous missing value, take the average of surrounding values to fill the first missing, then use the same method as only one missing value to fill the second missing. 
Any suggestion would be appreciated. Thank you.

Accepted Solutions
Solution
‎03-31-2018 10:50 AM
Esteemed Advisor
Posts: 5,474

Re: Fill missing value with the average of surrounding values

What is the advantage of such an interpolation method? Its simplicity maybe?

 

data have;
input a;
datalines;
1
2
2 
.
.
1
.
2
;

data want;
retain _a;
do nb = 1 by 1 until(not missing(a_));
    set have(rename=a=a_);
    end;
do i = 1 to nb;
    set have;
    if missing(a) then a = (_a + a_) / 2;
    _a = a;
    output;
    end;
drop nb i _a a_;
run;

proc print; run;
PG

View solution in original post


All Replies
Valued Guide
Posts: 558

Re: Fill missing value with the average of surrounding values

PROC EXPAND can be used for interpolating the moving average. But here depending on your condition you need to make some looping to make sure that there is only one missing value between consecutive records to get the output that you desired.

 

data have;
input a ;
obs=_n_;
datalines;
1
2
2 
.
.
1 
.
2
;
run;
data count_max_mis;
set have ;
if not missing(a) then count=0;
else count+1;
run;

proc sql;
select max(count) into: n_miss
	from count_max_mis;
quit;

%macro interpolate_miss();
%DO i=1 %to &n_miss;

data Inter;
set have;
if not missing(a) then call symput("value","NOTNULL");
else if missing(a) and SYMGET("value")="NOTNULL" then call symput("value","NULL");
else if missing(a) then delete;
run;

PROC EXPAND data=Inter out=Updated_miss(drop=time) method=none;
convert a / transformout=(missonly  cmovave 3 );
run;

data Have;
update have Updated_miss;
by obs;
run;
%end;
%MEND interpolate_miss;
%interpolate_miss;
Thanks,
Suryakiran
Contributor
Posts: 63

Re: Fill missing value with the average of surrounding values

Posted in reply to SuryaKiran
Thank you for the reply.
Solution
‎03-31-2018 10:50 AM
Esteemed Advisor
Posts: 5,474

Re: Fill missing value with the average of surrounding values

What is the advantage of such an interpolation method? Its simplicity maybe?

 

data have;
input a;
datalines;
1
2
2 
.
.
1
.
2
;

data want;
retain _a;
do nb = 1 by 1 until(not missing(a_));
    set have(rename=a=a_);
    end;
do i = 1 to nb;
    set have;
    if missing(a) then a = (_a + a_) / 2;
    _a = a;
    output;
    end;
drop nb i _a a_;
run;

proc print; run;
PG
Contributor
Posts: 63

Re: Fill missing value with the average of surrounding values

Thank you for the reply. It works well.
Super User
Posts: 10,681

Re: Fill missing value with the average of surrounding values

data have;
input a ;
datalines;
1
2
2 
.
.
1 
.
2
;
run;

data want;
 set have end=last;
 array x{1000} _temporary_;
 x{_n_}=a;
 if last then do;
  do i=1 to _n_;
    if missing(x{i}) then do;
      prev=x{i-1};next=.;
	  do j=i+1 to _n_;
        if not missing(x{j}) then do;next=x{j};leave;end;
	  end;
	  x{i}=(prev+next)/2;
	end;
  end;

  do k=1 to _n_;
    a=x{k};output;
  end;
 end;
keep a;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 215 views
  • 3 likes
  • 4 in conversation