Hello All,
I have data like -
PT | Visit1 | Visit2 | Visit3 | Visit4 | Visit5 |
1001 | 2 | . | 5 | 7 | 11 |
1002 | 5 | 7 | . | . | 9 |
1003 | . | 5 | 6 | 11 | 8 |
1004 | 7 | 7 | 11 | . | . |
1005 | . | 11 | 8 | 9 | . |
I want to use Array to impute the missing value with the average of adjacent non-missing value.
E.g. = for Pt. 1001 - for missing Visit2 I would expect the average of (Visit1+Visit3)/2
Same for Pt 1002 - for missing Visit3 and Visit 4, I would expect the average of (Visit2+Visit5)/2 .
In later cases, I dont want imputation as I dont have adjacent values present completely.
Does anyone know how to handle this case in array?
Thanks,
PankP
Instead of making a temporary holding array as suggested by @ballardw, I would just remember the first value for imputation:
data want; set have; array visits visit:; do _N_=1 to dim(visits)-1; if not missing(visits(_N_)) then _first=visits(_N_); /* save as first value for imputation */ else if not missing(_first) then do; do _I_=_N_+1 to dim(visits) until(not missing(visits(_I_))); end; if _I_<=dim(visits) then do; visits(_N_)=(_first+visits(_I_))/2; end; end; end; drop _:; run;
Can you transpose the data and then use a proc instead of dealing with it in a wide format?
I think PROC TIMESERIES or STDIZE will fill this in automatically if you make it a long format.
@Pankp wrote:
Hello All,
I have data like -
PT Visit1 Visit2 Visit3 Visit4 Visit5 1001 2 . 5 7 11 1002 5 7 . . 9 1003 . 5 6 11 8 1004 7 7 11 . . 1005 . 11 8 9 .
I want to use Array to impute the missing value with the average of adjacent non-missing value.
E.g. = for Pt. 1001 - for missing Visit2 I would expect the average of (Visit1+Visit3)/2
Same for Pt 1002 - for missing Visit3 and Visit 4, I would expect the average of (Visit2+Visit5)/2 .
In later cases, I dont want imputation as I dont have adjacent values present completely.
Does anyone know how to handle this case in array?
Thanks,
PankP
Here's one way that works for your example data:
data have; input PT Visit1 Visit2 Visit3 Visit4 Visit5 ; datalines; 1001 2 . 5 7 11 1002 5 7 . . 9 1003 . 5 6 11 8 1004 7 7 11 . . 1005 . 11 8 9 . ; run; data want; set have; array v visit1-visit5 ; array newv {5} _temporary_ ; /* copy values into temp holding array newv*/ do i=1 to dim(v); newv[i]=v[i]; end; /* if the first or last of the array are missing we aren't imputing */ do i = 2 to (dim(v)-1); if missing(v[i]) then do; /* attempt to find value prior to missing*/ do j= (i-1) to 1 by -1; if not missing(v[j]) then do; lower=v[j]; leave; end; end; /* attempt to find value after missing*/ do j= (i+1) to dim(v); if not missing(v[j]) then do; upper=v[j]; leave; end; end; if missing(lower) or missing(upper) then; else newv[i] = mean(lower,upper); end;/*missing v[i]*/ end; /* copy the newv to v*/ do i=1 to dim(v); v[i]=newv[i]; end; drop i j lower upper ; run;
The temporary array is used as an easy way to hold the desired values but doesn't run into issues with your second line of data with two successive missing values where use of a single array would result in using the first imputed as the lower indexed value.
Thanks Ballardw! it helps . Cheers!
Instead of making a temporary holding array as suggested by @ballardw, I would just remember the first value for imputation:
data want; set have; array visits visit:; do _N_=1 to dim(visits)-1; if not missing(visits(_N_)) then _first=visits(_N_); /* save as first value for imputation */ else if not missing(_first) then do; do _I_=_N_+1 to dim(visits) until(not missing(visits(_I_))); end; if _I_<=dim(visits) then do; visits(_N_)=(_first+visits(_I_))/2; end; end; end; drop _:; run;
Thanks s_lassen for the alternate solution. Cheers!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.