## Conditionally replace missing with means

Solved
Frequent Contributor
Posts: 110

# Conditionally replace missing with means

Hi everyone,

I am trying to standardize my data by replacing missing with means. However, only when the number of missing for a series of variables is less than half of the number of the series of variables (that is why I could not use proc standard). The data I have looks like below:

 id a1 a2 a3 a4 b1 b2 b3 b4 b5 1 1 5 4 1 1 3 2 3 4 2 4 1 3 5 1 4 3 3 1 3 2 3 3 3 3 3 2 3 4 3 3 2 5 2 3 1 1 1 5 3 3 2 4 4 4 1 3 2 6 1 2 1 3 2 3 4 7 3 2 1 4 2 2 4 4 8 3 5 4 3 1 1 3 4 4 9 1 1 2 5 4 2 4 1 3 10 1 3 5 5 1 1 2 4 2 11 3 1 4 2 4 4 12 4 5 1 5 3 2 2 3 4 13 1 3 4 2 2 2 1 3 1 14 3 5 1 2 3 1 4 4 15 5 2 5 3 1 1 1 1 2 16 2 2 3 4 1 17 4 4 5 1 2 1 1 4 18 4 1 2 2 1 1 4 1 3 19 5 5 1 2 2 1 1 3 3 20 2 4 4 5 3 3 3 2 3

I have to leave missing the values of a2, a3, a4 for id#11 and the values of a1-a4 for id #16 because nmiss(of a1-a4) is more than half of the number of variables. The rest of missing values, a3 for id#4,a2 for id#6, b1 for id#6, b4 for id#7, ... can be replaced by the means of the corresponding variable.

I start my code like this:

data want;set have;

array avar_all a1-a4;

array bvar_all b1-b5;

DO i=1 to dim(avar_all);

IF nmiss(of a1-a4)<4 THEN DO;

IF avar_all[i]=. THEN avar_all[i]=mean(???);

END;

END;

DO i=1 to dim(bvar_all);

IF nmiss(of b1-b5)<4 THEN DO;

IF bvar_all[i]=. THEN bvar_all[i]=mean(???);

END;

END;

run;

Anyone knows what I should put in (???)?

Thanks a lot!

Accepted Solutions
Solution
‎08-17-2016 09:23 AM
PROC Star
Posts: 2,369

## Re: Conditionally replace missing with means

[ Edited ]

Like this? [Edited as I wasn't computing a column average]

``````data HAVE;
input ID \$3. (A1 A2 A3 A4 B1 B2 B3 B4 B5) (2.);
cards;
1  1 5 4 1 1 3 2 3 4
2  4 1 3 5 1 4 3 3 1
3  2 3   3 3 3 3 2 3
4  3 3 2 5 2 3 1 1 1
5  3 3 2 4 4 4 1 3 2
6  1   2 1   3 2 3 4
7  3 2 1 4 2 2 4   4
8  3 5 4 3 1 1 3 4 4
9  1 1 2 5 4 2 4 1 3
10 1 3 5 5 1 1 2 4 2
11 3       1 4 2 4 4
12 4 5 1 5 3 2 2 3 4
13 1 3 4 2 2 2 1 3 1
14 3 5 1 2 3   1 4 4
15 5 2 5 3 1 1 1 1 2
16         2 2 3 4 1
17 4 4 5 1 2 1   1 4
18 4 1 2 2 1 1 4 1 3
19 5 5 1 2 2 1 1 3 3
20 2 4 4 5 3 3 3 2 3
run;
proc means data=HAVE(keep=A: B:) noprint;
output out=MEANS(drop=_:) mean=/autoname;
run;
data WANT;
set HAVE ;
retain P 1;
set MEANS point=P;
array A [*] A1-A4 ;
array B [*] B1-B5 ;
keep ID A1-A4 B1-B5 ;
if 0 < nmiss(of A[*]) < n(of A[*]) then do I=1 to dim(A);
A[I]= coalesce( A[I], vvaluex(cats('A',I,'_MEAN' )));
end;
if 0 < nmiss(of B[*]) < n(of B[*]) then do I=1 to dim(B);
B[I]= coalesce( B[I], vvaluex(cats('B',I,'_MEAN' )));
end;
run;
``````

Function whichn() cannot find a missing value, so we have to loop through the whole array to replace values sadly.

id a1 a2 a3 a4 b1 b2 b3 b4 b5
1 1 5 4 1 1 3 2 3 4
2 4 1 3 5 1 4 3 3 1
3 2 3 2.82 3 3 3 3 2 3
4 3 3 2 5 2 3 1 1 1
5 3 3 2 4 4 4 1 3 2
6 1 3.24 2 1 2.05 3 2 3 4
7 3 2 1 4 2 2 4 2.63 4
8 3 5 4 3 1 1 3 4 4
9 1 1 2 5 4 2 4 1 3
10 1 3 5 5 1 1 2 4 2
11 3 . . . 1 4 2 4 4
12 4 5 1 5 3 2 2 3 4

All Replies
Super Contributor
Posts: 325

## Re: Conditionally replace missing with means

IF avar_all[i]=. THEN avar_all[i]=mean(???);

Use  mean(of avar_all[*]);

IF bvar_all[i]=. THEN bvar_all[i]=mean(???);

Use  mean(of bvar_all[*]);

Frequent Contributor
Posts: 110

## Re: Conditionally replace missing with means

Thank you!

mean(of avar_all[*]) gives the mean of across-variable, a1-a4. However, I need to replace the missing with the mean of other values of the same variable (just like proc standard does).

For example, a3 value for id#3 should be replaced by the mean of a3, 2.82.

Super Contributor
Posts: 325

## Re: Conditionally replace missing with means

I don't see how you are getting 2.82

Give the individual values as how you are deriving it.

Solution
‎08-17-2016 09:23 AM
PROC Star
Posts: 2,369

## Re: Conditionally replace missing with means

[ Edited ]

Like this? [Edited as I wasn't computing a column average]

``````data HAVE;
input ID \$3. (A1 A2 A3 A4 B1 B2 B3 B4 B5) (2.);
cards;
1  1 5 4 1 1 3 2 3 4
2  4 1 3 5 1 4 3 3 1
3  2 3   3 3 3 3 2 3
4  3 3 2 5 2 3 1 1 1
5  3 3 2 4 4 4 1 3 2
6  1   2 1   3 2 3 4
7  3 2 1 4 2 2 4   4
8  3 5 4 3 1 1 3 4 4
9  1 1 2 5 4 2 4 1 3
10 1 3 5 5 1 1 2 4 2
11 3       1 4 2 4 4
12 4 5 1 5 3 2 2 3 4
13 1 3 4 2 2 2 1 3 1
14 3 5 1 2 3   1 4 4
15 5 2 5 3 1 1 1 1 2
16         2 2 3 4 1
17 4 4 5 1 2 1   1 4
18 4 1 2 2 1 1 4 1 3
19 5 5 1 2 2 1 1 3 3
20 2 4 4 5 3 3 3 2 3
run;
proc means data=HAVE(keep=A: B:) noprint;
output out=MEANS(drop=_:) mean=/autoname;
run;
data WANT;
set HAVE ;
retain P 1;
set MEANS point=P;
array A [*] A1-A4 ;
array B [*] B1-B5 ;
keep ID A1-A4 B1-B5 ;
if 0 < nmiss(of A[*]) < n(of A[*]) then do I=1 to dim(A);
A[I]= coalesce( A[I], vvaluex(cats('A',I,'_MEAN' )));
end;
if 0 < nmiss(of B[*]) < n(of B[*]) then do I=1 to dim(B);
B[I]= coalesce( B[I], vvaluex(cats('B',I,'_MEAN' )));
end;
run;
``````

Function whichn() cannot find a missing value, so we have to loop through the whole array to replace values sadly.

id a1 a2 a3 a4 b1 b2 b3 b4 b5
1 1 5 4 1 1 3 2 3 4
2 4 1 3 5 1 4 3 3 1
3 2 3 2.82 3 3 3 3 2 3
4 3 3 2 5 2 3 1 1 1
5 3 3 2 4 4 4 1 3 2
6 1 3.24 2 1 2.05 3 2 3 4
7 3 2 1 4 2 2 4 2.63 4
8 3 5 4 3 1 1 3 4 4
9 1 1 2 5 4 2 4 1 3
10 1 3 5 5 1 1 2 4 2
11 3 . . . 1 4 2 4 4
12 4 5 1 5 3 2 2 3 4
Super User
Posts: 23,754

## Re: Conditionally replace missing with means

@liziwu I'm curious whats the logic between looking at the number of missing row wise, but then replacing the value column wise.

Frequent Contributor
Posts: 110

## Re: Conditionally replace missing with means

Thanks, Reeza.
It is a survey data I'm working on. For each measurement, there are multiple items. For example, a1-a4 is to measure individuals' mental health, and b1-b5 is to measure physical health. The sum of a1-a4 is the final score for a person. Therefore, if one person only completed one item out of four, the person cannot be counted as a valid observation.
Super User
Posts: 23,754

## Re: Conditionally replace missing with means

Do multiple steps.

First idenrify the variables to be standardized by getting the number of missing per column. Create a list of the variable names from that and pass it to proc standardize. Here's a sketch (untested).

Proc means data =have n nmiss stackods;

var a1-a4 b1-b5;

ODS output summary=stats;

run;

data var_list;

set stats end=EEof;

length var_list \$250.;

retain var_list;

if nmiss/n > 0.5 the do;

var_list = catx(" ", var_list, name);

end;

if eeof then call symputx('var_list', var_list);

run;

proc standardize...

var &var_list;

...

run;

☑ This topic is solved.