Imputing missing value with Average value in Array

Reply
Established User
Posts: 1

Imputing missing value with Average value in Array

 Hello All,

 

I have data like -

 

PTVisit1 Visit2Visit3Visit4Visit5
10012.5711
100257..9
1003.56118
10047711..
1005.1189.

 

I want to use Array to impute the missing value with the average of adjacent non-missing value.

 

E.g. = for Pt. 1001 - for missing Visit2 I would expect the  average of (Visit1+Visit3)/2 

Same for Pt 1002  - for missing Visit3 and Visit 4, I would expect the  average of (Visit2+Visit5)/2 .

 

In later cases, I dont want imputation as I dont have adjacent values present completely.

 

Does anyone know how to handle this case in array?

 

Thanks,

PankP

Super User
Posts: 22,575

Re: Imputing missing value with Average value in Array

Can you transpose the data and then use a proc instead of dealing with it in a wide format?

 

I think PROC TIMESERIES or STDIZE will fill this in automatically if you make it a long format. 

 


Pankp wrote:

 Hello All,

 

I have data like -

 

PT Visit1  Visit2 Visit3 Visit4 Visit5
1001 2 . 5 7 11
1002 5 7 . . 9
1003 . 5 6 11 8
1004 7 7 11 . .
1005 . 11 8 9 .

 

I want to use Array to impute the missing value with the average of adjacent non-missing value.

 

E.g. = for Pt. 1001 - for missing Visit2 I would expect the  average of (Visit1+Visit3)/2 

Same for Pt 1002  - for missing Visit3 and Visit 4, I would expect the  average of (Visit2+Visit5)/2 .

 

In later cases, I dont want imputation as I dont have adjacent values present completely.

 

Does anyone know how to handle this case in array?

 

Thanks,

PankP


 

Super User
Posts: 12,734

Re: Imputing missing value with Average value in Array

Here's one way that works for your example data:

 

data have;
   input PT Visit1  Visit2 Visit3 Visit4 Visit5 ;
datalines;
1001 2 . 5 7 11 
1002 5 7 . . 9 
1003 . 5 6 11 8 
1004 7 7 11 . . 
1005 . 11 8 9 . 
;
run;

data want;
   set have;
   array v visit1-visit5 ;
   array newv {5} _temporary_ ;
   /* copy values into temp holding array newv*/
   do i=1 to dim(v);
      newv[i]=v[i];
   end;
   /* if the first or last of the array are 
      missing we aren't imputing
   */
   do i = 2 to (dim(v)-1);
      if missing(v[i]) then do;
         /* attempt to find value prior to missing*/
         do j= (i-1) to 1 by -1;
            if not missing(v[j]) then do;
               lower=v[j];
               leave;
            end;
         end;
         /* attempt to find value after missing*/
         do j= (i+1) to dim(v);
            if not missing(v[j]) then do;
               upper=v[j];
               leave;
            end;
         end;
         if missing(lower) or missing(upper) then;
         else  newv[i] = mean(lower,upper);
      end;/*missing v[i]*/
   end;
   /* copy the newv to v*/
   do i=1 to dim(v);
      v[i]=newv[i];
   end;       

   drop i j lower upper ;
run;


The temporary array is used as an easy way to hold the desired values but doesn't run into issues with your second line of data with two successive missing values where use of a single array would result in using the first imputed as the lower indexed value.

 

PROC Star
Posts: 220

Re: Imputing missing value with Average value in Array

Instead of making a temporary holding array as suggested by @ballardw, I would just remember the first value for imputation:

data want;
  set have;
  array visits visit:;
  do _N_=1 to dim(visits)-1;
    if not missing(visits(_N_)) then
      _first=visits(_N_); /* save as first value for imputation */
    else if not missing(_first) then do;
      do _I_=_N_+1 to dim(visits) until(not missing(visits(_I_)));
        end;
      if _I_<=dim(visits) then do;
        visits(_N_)=(_first+visits(_I_))/2;
        end;
      end;
    end;
  drop _:;
run;

 

Ask a Question
Discussion stats
  • 3 replies
  • 111 views
  • 0 likes
  • 4 in conversation