## Fill missing data with closest data

Solved
Occasional Contributor
Posts: 9

# Fill missing data with closest data

data have;

input idno weight;

cards;

1 .

1 150

1 .

1 .

1 130

2 .

2 140

2 .

3 .

3 .

3 .

3 .

3 160

;

run;

goal: for each missing data, fill in with the closest not-missing data. but if the closest not-missing data is more than 3 observations away, leave it blank.

thanks for the help!

expected output:

1     150

1     150

1     150

1     130

1     130

2     140

2     140

2     140

3     .

3     160

3     160

3     160

3     160

Accepted Solutions
Solution
‎10-02-2012 04:35 PM
Super User
Posts: 6,781

## Re: Fill missing data with closest data

Two related questions ...

Should we assume that the "closest" value must come from the same IDNO?

How do you handle ties, such as:

2  140

2      .

2  145

(The answer is allowed to be, "I don't care, just pick one.")

All Replies
Solution
‎10-02-2012 04:35 PM
Super User
Posts: 6,781

## Re: Fill missing data with closest data

Two related questions ...

Should we assume that the "closest" value must come from the same IDNO?

How do you handle ties, such as:

2  140

2      .

2  145

(The answer is allowed to be, "I don't care, just pick one.")

Occasional Contributor
Posts: 9

## Re: Fill missing data with closest data

yup, "closest" value must come from the same idno, and when ties, pick which ever one,

thanks!

PROC Star
Posts: 8,164

## Re: Fill missing data with closest data

There is probably a more direct route but, if your actual data don't get any more complex than your example, you might be able to achieve what you want with something like:

data have;

input idno weight;

if missing(weight) then weight=-1;

cards;

1 .

1 150

1 .

1 .

1 130

2 .

2 140

2 .

3 .

3 .

3 .

3 .

3 160

;

run;

proc transpose data=have out=need;

by idno;

run;

data need (drop=i j);

set need;

array weights(*) col:;

do i=1 to dim(weights);

if weights(i) ge 0 then leave;

end;

do j=1 to i-1;

if j ge i-3 and weights(j) eq -1 then

weights(j)=weights(i);

end;

do j=i+1 to dim(weights);

if j le i+3 and weights(j) eq -1 then

weights(j)=weights(i);

end;

run;

proc transpose data=need out=want (drop=_name_ where=(not missing(weight)));

by idno;

run;

data want;

set want;

if weight eq -1 then call missing(weight);

run;

Super User
Posts: 6,781

## Re: Fill missing data with closest data

OK, here's one way (not checked for syntax, nor even for correctness!).  It assumes your data set is already sorted.

proc transpose data=have out=all_ages (keep=idno age_ prefix=age_;

by idno;

var age;

run;

data want;

merge have all_ages;

by idno;

array ages {*} age_:;

if first.idno then count=1;

else count+1;

if age=. then do;

do _i_=max(1, count-1), min(dim(ages), count+1), max(1, count-2), min(dim(ages), count+2), max(1, count-3), min(dim(ages), count+3)

until (age > .);

if ages{_i_} > . then age = ages{_i_};

end;

end;

drop ages_: _i_ count;

run;

For efficiency purposes, it is possible to cut down on the number of uses of the DIM function.  But the idea is workable as is.  Good luck.

Looks like I used AGE instead of WEIGHT, but the idea is the same.

PROC Star
Posts: 8,164

## Re: Fill missing data with closest data

Here is another possibility:

data have;

input idno weight;

cards;

1 .

1 150

1 .

1 .

1 130

2 .

2 140

2 .

3 .

3 .

3 .

3 .

3 160

;

run;

data need1 need2 (rename=(recnum=recnumb weight=weightb));

set have;

by idno;

if first.idno then do;

recnum=0;

counter=1;

non_missing_counter=0;

end;

recnum+1;

if not missing(weight) then do;

non_missing_counter+1;

if non_missing_counter gt 1 then counter+1;

output need1;

output need2;

end;

else output need1;

run;

data want (keep=idno weight);

set need1 (drop=weight);

by idno counter;

if first.counter then set need2;

if abs(recnum-recnumb) gt 3 then call missing(weight);

else weight=weightb;

run;

Posts: 3,167

## Re: Fill missing data with closest data

Here is another data step solution, one-step, in trading for an arbitrary number of idno group size.

data have;

input idno weight;

cards;

1 .

1 150

1 .

1 .

1 130

2 .

2 140

2 .

3 .

3 .

3 .

3 .

3 160

;

run;

data want;

array t(100) _temporary_;

array nm(100) _temporary_;

do _n_=1 by 1 until (last.idno);

set have;

by idno;

t(_n_)=weight;

if not missing(weight) then do;

_i+1;

nm(_i)=_n_;

end;

end;

do _j=1 to _n_;

do _k=1 to _i;

_min=min(_min,abs(_j-nm(_k)));

if _min=abs(_j-nm(_k)) then  _n1=nm(_k);

end;

call missing(_min);

if abs(_n1-_j)>3 then call missing(weight); else weight=t(_n1);output;

end;

call missing(of _i t(*) nm(*));

drop _:;

run;

proc print;run;

Haikuo

🔒 This topic is solved and locked.