Fill missing data with closest data

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Fill missing data with closest data

data have;

  input idno weight;

  cards;

1 .

1 150

1 .

1 .

1 130

2 .

2 140

2 .

3 .

3 .

3 .

3 .

3 160

;

run;

goal: for each missing data, fill in with the closest not-missing data. but if the closest not-missing data is more than 3 observations away, leave it blank.

thanks for the help!

expected output:

1     150

1     150

1     150

1     130

1     130

2     140

2     140

2     140

3     .

3     160

3     160

3     160

3     160


Accepted Solutions
Solution
‎10-02-2012 04:35 PM
Super User
Posts: 5,083

Re: Fill missing data with closest data

Two related questions ...

Should we assume that the "closest" value must come from the same IDNO?

How do you handle ties, such as:

2  140

2      .

2  145

(The answer is allowed to be, "I don't care, just pick one.")

View solution in original post


All Replies
Solution
‎10-02-2012 04:35 PM
Super User
Posts: 5,083

Re: Fill missing data with closest data

Two related questions ...

Should we assume that the "closest" value must come from the same IDNO?

How do you handle ties, such as:

2  140

2      .

2  145

(The answer is allowed to be, "I don't care, just pick one.")

Occasional Contributor
Posts: 9

Re: Fill missing data with closest data

yup, "closest" value must come from the same idno, and when ties, pick which ever one,

thanks!

PROC Star
Posts: 7,363

Re: Fill missing data with closest data

There is probably a more direct route but, if your actual data don't get any more complex than your example, you might be able to achieve what you want with something like:

data have;

  input idno weight;

  if missing(weight) then weight=-1;

  cards;

1 .

1 150

1 .

1 .

1 130

2 .

2 140

2 .

3 .

3 .

3 .

3 .

3 160

;

run;

proc transpose data=have out=need;

  by idno;

run;

data need (drop=i j);

  set need;

  array weights(*) col:;

  do i=1 to dim(weights);

    if weights(i) ge 0 then leave;

  end;

  do j=1 to i-1;

    if j ge i-3 and weights(j) eq -1 then

     weights(j)=weights(i);

  end;

  do j=i+1 to dim(weights);

    if j le i+3 and weights(j) eq -1 then

     weights(j)=weights(i);

  end;

run;

proc transpose data=need out=want (drop=_name_ where=(not missing(weight)));

  by idno;

run;

data want;

  set want;

  if weight eq -1 then call missing(weight);

run;

Super User
Posts: 5,083

Re: Fill missing data with closest data

OK, here's one way (not checked for syntax, nor even for correctness!).  It assumes your data set is already sorted.

proc transpose data=have out=all_ages (keep=idno age_Smiley Happy prefix=age_;

   by idno;

   var age;

run;

data want;

   merge have all_ages;

   by idno;

   array ages {*} age_:;

   if first.idno then count=1;

   else count+1;

   if age=. then do;

     do _i_=max(1, count-1), min(dim(ages), count+1), max(1, count-2), min(dim(ages), count+2), max(1, count-3), min(dim(ages), count+3)

          until (age > .);

         if ages{_i_} > . then age = ages{_i_};

      end;

   end;

   drop ages_: _i_ count;

run;

For efficiency purposes, it is possible to cut down on the number of uses of the DIM function.  But the idea is workable as is.  Good luck.

Looks like I used AGE instead of WEIGHT, but the idea is the same.

PROC Star
Posts: 7,363

Re: Fill missing data with closest data

Here is another possibility:

data have;

  input idno weight;

  cards;

1 .

1 150

1 .

1 .

1 130

2 .

2 140

2 .

3 .

3 .

3 .

3 .

3 160

;

run;

data need1 need2 (rename=(recnum=recnumb weight=weightb));

  set have;

  by idno;

  if first.idno then do;

    recnum=0;

    counter=1;

    non_missing_counter=0;

  end;

  recnum+1;

  if not missing(weight) then do;

    non_missing_counter+1;

    if non_missing_counter gt 1 then counter+1;

    output need1;

    output need2;

  end;

  else output need1;

run;

data want (keep=idno weight);

  set need1 (drop=weight);

  by idno counter;

  if first.counter then set need2;

  if abs(recnum-recnumb) gt 3 then call missing(weight);

  else weight=weightb;

run;

Respected Advisor
Posts: 3,124

Re: Fill missing data with closest data

Here is another data step solution, one-step, in trading for an arbitrary number of idno group size.

data have;

  input idno weight;

  cards;

1 .

1 150

1 .

1 .

1 130

2 .

2 140

2 .

3 .

3 .

3 .

3 .

3 160

;

run;

data want;

  array t(100) _temporary_;

  array nm(100) _temporary_;

    do _n_=1 by 1 until (last.idno);

      set have;

        by idno;

           t(_n_)=weight;

           if not missing(weight) then do;

              _i+1;

              nm(_i)=_n_;

            end;

    end;

    do _j=1 to _n_;

          do _k=1 to _i;

            _min=min(_min,abs(_j-nm(_k)));

            if _min=abs(_j-nm(_k)) then  _n1=nm(_k);

          end;

          call missing(_min);

          if abs(_n1-_j)>3 then call missing(weight); else weight=t(_n1);output;

    end;

    call missing(of _i t(*) nm(*));

    drop _:;

run;

proc print;run;

Haikuo

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 331 views
  • 6 likes
  • 4 in conversation