New Contributor
Posts: 2

# Imputing missing value with Average value in Array

Hello All,

I have data like -

 PT Visit1 Visit2 Visit3 Visit4 Visit5 1001 2 . 5 7 11 1002 5 7 . . 9 1003 . 5 6 11 8 1004 7 7 11 . . 1005 . 11 8 9 .

I want to use Array to impute the missing value with the average of adjacent non-missing value.

E.g. = for Pt. 1001 - for missing Visit2 I would expect the  average of (Visit1+Visit3)/2

Same for Pt 1002  - for missing Visit3 and Visit 4, I would expect the  average of (Visit2+Visit5)/2 .

In later cases, I dont want imputation as I dont have adjacent values present completely.

Does anyone know how to handle this case in array?

Thanks,

PankP

Super User
Posts: 23,778

## Re: Imputing missing value with Average value in Array

Can you transpose the data and then use a proc instead of dealing with it in a wide format?

I think PROC TIMESERIES or STDIZE will fill this in automatically if you make it a long format.

Pankp wrote:

Hello All,

I have data like -

 PT Visit1 Visit2 Visit3 Visit4 Visit5 1001 2 . 5 7 11 1002 5 7 . . 9 1003 . 5 6 11 8 1004 7 7 11 . . 1005 . 11 8 9 .

I want to use Array to impute the missing value with the average of adjacent non-missing value.

E.g. = for Pt. 1001 - for missing Visit2 I would expect the  average of (Visit1+Visit3)/2

Same for Pt 1002  - for missing Visit3 and Visit 4, I would expect the  average of (Visit2+Visit5)/2 .

In later cases, I dont want imputation as I dont have adjacent values present completely.

Does anyone know how to handle this case in array?

Thanks,

PankP

Super User
Posts: 13,583

## Re: Imputing missing value with Average value in Array

Here's one way that works for your example data:

```data have;
input PT Visit1  Visit2 Visit3 Visit4 Visit5 ;
datalines;
1001 2 . 5 7 11
1002 5 7 . . 9
1003 . 5 6 11 8
1004 7 7 11 . .
1005 . 11 8 9 .
;
run;

data want;
set have;
array v visit1-visit5 ;
array newv {5} _temporary_ ;
/* copy values into temp holding array newv*/
do i=1 to dim(v);
newv[i]=v[i];
end;
/* if the first or last of the array are
missing we aren't imputing
*/
do i = 2 to (dim(v)-1);
if missing(v[i]) then do;
/* attempt to find value prior to missing*/
do j= (i-1) to 1 by -1;
if not missing(v[j]) then do;
lower=v[j];
leave;
end;
end;
/* attempt to find value after missing*/
do j= (i+1) to dim(v);
if not missing(v[j]) then do;
upper=v[j];
leave;
end;
end;
if missing(lower) or missing(upper) then;
else  newv[i] = mean(lower,upper);
end;/*missing v[i]*/
end;
/* copy the newv to v*/
do i=1 to dim(v);
v[i]=newv[i];
end;

drop i j lower upper ;
run;

```

The temporary array is used as an easy way to hold the desired values but doesn't run into issues with your second line of data with two successive missing values where use of a single array would result in using the first imputed as the lower indexed value.

PROC Star
Posts: 266

## Re: Imputing missing value with Average value in Array

Instead of making a temporary holding array as suggested by @ballardw, I would just remember the first value for imputation:

```data want;
set have;
array visits visit:;
do _N_=1 to dim(visits)-1;
if not missing(visits(_N_)) then
_first=visits(_N_); /* save as first value for imputation */
else if not missing(_first) then do;
do _I_=_N_+1 to dim(visits) until(not missing(visits(_I_)));
end;
if _I_<=dim(visits) then do;
visits(_N_)=(_first+visits(_I_))/2;
end;
end;
end;
drop _:;
run;

```

Discussion stats
• 3 replies
• 128 views
• 0 likes
• 4 in conversation