BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Pankp
Fluorite | Level 6

 Hello All,

 

I have data like -

 

PTVisit1 Visit2Visit3Visit4Visit5
10012.5711
100257..9
1003.56118
10047711..
1005.1189.

 

I want to use Array to impute the missing value with the average of adjacent non-missing value.

 

E.g. = for Pt. 1001 - for missing Visit2 I would expect the  average of (Visit1+Visit3)/2 

Same for Pt 1002  - for missing Visit3 and Visit 4, I would expect the  average of (Visit2+Visit5)/2 .

 

In later cases, I dont want imputation as I dont have adjacent values present completely.

 

Does anyone know how to handle this case in array?

 

Thanks,

PankP

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

Instead of making a temporary holding array as suggested by @ballardw, I would just remember the first value for imputation:

data want;
  set have;
  array visits visit:;
  do _N_=1 to dim(visits)-1;
    if not missing(visits(_N_)) then
      _first=visits(_N_); /* save as first value for imputation */
    else if not missing(_first) then do;
      do _I_=_N_+1 to dim(visits) until(not missing(visits(_I_)));
        end;
      if _I_<=dim(visits) then do;
        visits(_N_)=(_first+visits(_I_))/2;
        end;
      end;
    end;
  drop _:;
run;

 

View solution in original post

5 REPLIES 5
Reeza
Super User

Can you transpose the data and then use a proc instead of dealing with it in a wide format?

 

I think PROC TIMESERIES or STDIZE will fill this in automatically if you make it a long format. 

 


@Pankp wrote:

 Hello All,

 

I have data like -

 

PT Visit1  Visit2 Visit3 Visit4 Visit5
1001 2 . 5 7 11
1002 5 7 . . 9
1003 . 5 6 11 8
1004 7 7 11 . .
1005 . 11 8 9 .

 

I want to use Array to impute the missing value with the average of adjacent non-missing value.

 

E.g. = for Pt. 1001 - for missing Visit2 I would expect the  average of (Visit1+Visit3)/2 

Same for Pt 1002  - for missing Visit3 and Visit 4, I would expect the  average of (Visit2+Visit5)/2 .

 

In later cases, I dont want imputation as I dont have adjacent values present completely.

 

Does anyone know how to handle this case in array?

 

Thanks,

PankP


 

ballardw
Super User

Here's one way that works for your example data:

 

data have;
   input PT Visit1  Visit2 Visit3 Visit4 Visit5 ;
datalines;
1001 2 . 5 7 11 
1002 5 7 . . 9 
1003 . 5 6 11 8 
1004 7 7 11 . . 
1005 . 11 8 9 . 
;
run;

data want;
   set have;
   array v visit1-visit5 ;
   array newv {5} _temporary_ ;
   /* copy values into temp holding array newv*/
   do i=1 to dim(v);
      newv[i]=v[i];
   end;
   /* if the first or last of the array are 
      missing we aren't imputing
   */
   do i = 2 to (dim(v)-1);
      if missing(v[i]) then do;
         /* attempt to find value prior to missing*/
         do j= (i-1) to 1 by -1;
            if not missing(v[j]) then do;
               lower=v[j];
               leave;
            end;
         end;
         /* attempt to find value after missing*/
         do j= (i+1) to dim(v);
            if not missing(v[j]) then do;
               upper=v[j];
               leave;
            end;
         end;
         if missing(lower) or missing(upper) then;
         else  newv[i] = mean(lower,upper);
      end;/*missing v[i]*/
   end;
   /* copy the newv to v*/
   do i=1 to dim(v);
      v[i]=newv[i];
   end;       

   drop i j lower upper ;
run;


The temporary array is used as an easy way to hold the desired values but doesn't run into issues with your second line of data with two successive missing values where use of a single array would result in using the first imputed as the lower indexed value.

 

Pankp
Fluorite | Level 6

Thanks Ballardw! it helps . Cheers!

s_lassen
Meteorite | Level 14

Instead of making a temporary holding array as suggested by @ballardw, I would just remember the first value for imputation:

data want;
  set have;
  array visits visit:;
  do _N_=1 to dim(visits)-1;
    if not missing(visits(_N_)) then
      _first=visits(_N_); /* save as first value for imputation */
    else if not missing(_first) then do;
      do _I_=_N_+1 to dim(visits) until(not missing(visits(_I_)));
        end;
      if _I_<=dim(visits) then do;
        visits(_N_)=(_first+visits(_I_))/2;
        end;
      end;
    end;
  drop _:;
run;

 

Pankp
Fluorite | Level 6

Thanks s_lassen for the alternate solution. Cheers! 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1131 views
  • 0 likes
  • 4 in conversation