Fill missing data with closest data

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Fill missing data with closest data

data have;

  input idno weight;

  cards;

1 .

1 150

1 .

1 .

1 130

2 .

2 140

2 .

3 .

3 .

3 .

3 .

3 160

;

run;

goal: for each missing data, fill in with the closest not-missing data. but if the closest not-missing data is more than 3 observations away, leave it blank.

thanks for the help!

expected output:

1     150

1     150

1     150

1     130

1     130

2     140

2     140

2     140

3     .

3     160

3     160

3     160

3     160


Accepted Solutions
Solution
‎10-02-2012 04:35 PM
Super User
Posts: 5,503

Re: Fill missing data with closest data

Posted in reply to Davidliu494

Two related questions ...

Should we assume that the "closest" value must come from the same IDNO?

How do you handle ties, such as:

2  140

2      .

2  145

(The answer is allowed to be, "I don't care, just pick one.")

View solution in original post


All Replies
Solution
‎10-02-2012 04:35 PM
Super User
Posts: 5,503

Re: Fill missing data with closest data

Posted in reply to Davidliu494

Two related questions ...

Should we assume that the "closest" value must come from the same IDNO?

How do you handle ties, such as:

2  140

2      .

2  145

(The answer is allowed to be, "I don't care, just pick one.")

Occasional Contributor
Posts: 9

Re: Fill missing data with closest data

Posted in reply to Astounding

yup, "closest" value must come from the same idno, and when ties, pick which ever one,

thanks!

PROC Star
Posts: 7,471

Re: Fill missing data with closest data

Posted in reply to Davidliu494

There is probably a more direct route but, if your actual data don't get any more complex than your example, you might be able to achieve what you want with something like:

data have;

  input idno weight;

  if missing(weight) then weight=-1;

  cards;

1 .

1 150

1 .

1 .

1 130

2 .

2 140

2 .

3 .

3 .

3 .

3 .

3 160

;

run;

proc transpose data=have out=need;

  by idno;

run;

data need (drop=i j);

  set need;

  array weights(*) col:;

  do i=1 to dim(weights);

    if weights(i) ge 0 then leave;

  end;

  do j=1 to i-1;

    if j ge i-3 and weights(j) eq -1 then

     weights(j)=weights(i);

  end;

  do j=i+1 to dim(weights);

    if j le i+3 and weights(j) eq -1 then

     weights(j)=weights(i);

  end;

run;

proc transpose data=need out=want (drop=_name_ where=(not missing(weight)));

  by idno;

run;

data want;

  set want;

  if weight eq -1 then call missing(weight);

run;

Super User
Posts: 5,503

Re: Fill missing data with closest data

Posted in reply to Davidliu494

OK, here's one way (not checked for syntax, nor even for correctness!).  It assumes your data set is already sorted.

proc transpose data=have out=all_ages (keep=idno age_Smiley Happy prefix=age_;

   by idno;

   var age;

run;

data want;

   merge have all_ages;

   by idno;

   array ages {*} age_:;

   if first.idno then count=1;

   else count+1;

   if age=. then do;

     do _i_=max(1, count-1), min(dim(ages), count+1), max(1, count-2), min(dim(ages), count+2), max(1, count-3), min(dim(ages), count+3)

          until (age > .);

         if ages{_i_} > . then age = ages{_i_};

      end;

   end;

   drop ages_: _i_ count;

run;

For efficiency purposes, it is possible to cut down on the number of uses of the DIM function.  But the idea is workable as is.  Good luck.

Looks like I used AGE instead of WEIGHT, but the idea is the same.

PROC Star
Posts: 7,471

Re: Fill missing data with closest data

Posted in reply to Davidliu494

Here is another possibility:

data have;

  input idno weight;

  cards;

1 .

1 150

1 .

1 .

1 130

2 .

2 140

2 .

3 .

3 .

3 .

3 .

3 160

;

run;

data need1 need2 (rename=(recnum=recnumb weight=weightb));

  set have;

  by idno;

  if first.idno then do;

    recnum=0;

    counter=1;

    non_missing_counter=0;

  end;

  recnum+1;

  if not missing(weight) then do;

    non_missing_counter+1;

    if non_missing_counter gt 1 then counter+1;

    output need1;

    output need2;

  end;

  else output need1;

run;

data want (keep=idno weight);

  set need1 (drop=weight);

  by idno counter;

  if first.counter then set need2;

  if abs(recnum-recnumb) gt 3 then call missing(weight);

  else weight=weightb;

run;

Respected Advisor
Posts: 3,156

Re: Fill missing data with closest data

Posted in reply to Davidliu494

Here is another data step solution, one-step, in trading for an arbitrary number of idno group size.

data have;

  input idno weight;

  cards;

1 .

1 150

1 .

1 .

1 130

2 .

2 140

2 .

3 .

3 .

3 .

3 .

3 160

;

run;

data want;

  array t(100) _temporary_;

  array nm(100) _temporary_;

    do _n_=1 by 1 until (last.idno);

      set have;

        by idno;

           t(_n_)=weight;

           if not missing(weight) then do;

              _i+1;

              nm(_i)=_n_;

            end;

    end;

    do _j=1 to _n_;

          do _k=1 to _i;

            _min=min(_min,abs(_j-nm(_k)));

            if _min=abs(_j-nm(_k)) then  _n1=nm(_k);

          end;

          call missing(_min);

          if abs(_n1-_j)>3 then call missing(weight); else weight=t(_n1);output;

    end;

    call missing(of _i t(*) nm(*));

    drop _:;

run;

proc print;run;

Haikuo

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 339 views
  • 6 likes
  • 4 in conversation