DATA Step, Macro, Functions and more

Conditionally replace missing with means

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 93
Accepted Solution

Conditionally replace missing with means

Hi everyone,

 

I am trying to standardize my data by replacing missing with means. However, only when the number of missing for a series of variables is less than half of the number of the series of variables (that is why I could not use proc standard). The data I have looks like below:

id a1 a2 a3 a4 b1 b2 b3 b4 b5
1 1 5 4 1 1 3 2 3 4
2 4 1 3 5 1 4 3 3 1
3 2 3   3 3 3 3 2 3
4 3 3 2 5 2 3 1 1 1
5 3 3 2 4 4 4 1 3 2
6 1   2 1   3 2 3 4
7 3 2 1 4 2 2 4   4
8 3 5 4 3 1 1 3 4 4
9 1 1 2 5 4 2 4 1 3
10 1 3 5 5 1 1 2 4 2
11 3       1 4 2 4 4
12 4 5 1 5 3 2 2 3 4
13 1 3 4 2 2 2 1 3 1
14 3 5 1 2 3   1 4 4
15 5 2 5 3 1 1 1 1 2
16         2 2 3 4 1
17 4 4 5 1 2 1   1 4
18 4 1 2 2 1 1 4 1 3
19 5 5 1 2 2 1 1 3 3
20 2 4 4 5 3 3 3 2 3

 

I have to leave missing the values of a2, a3, a4 for id#11 and the values of a1-a4 for id #16 because nmiss(of a1-a4) is more than half of the number of variables. The rest of missing values, a3 for id#4,a2 for id#6, b1 for id#6, b4 for id#7, ... can be replaced by the means of the corresponding variable.

 

I start my code like this:

data want;set have;

array avar_all a1-a4;

array bvar_all b1-b5;

DO i=1 to dim(avar_all);

IF nmiss(of a1-a4)<4 THEN DO;

IF avar_all[i]=. THEN avar_all[i]=mean(???);

END;

END;

DO i=1 to dim(bvar_all);

IF nmiss(of b1-b5)<4 THEN DO;

IF bvar_all[i]=. THEN bvar_all[i]=mean(???);

END;

END;

run;

 

Anyone knows what I should put in (???)?

 

Thanks a lot!


Accepted Solutions
Solution
‎08-17-2016 09:23 AM
PROC Star
Posts: 1,561

Re: Conditionally replace missing with means

[ Edited ]

Like this? [Edited as I wasn't computing a column average]

 

 

data HAVE;
input ID $3. (A1 A2 A3 A4 B1 B2 B3 B4 B5) (2.); 
cards;
1  1 5 4 1 1 3 2 3 4 
2  4 1 3 5 1 4 3 3 1 
3  2 3   3 3 3 3 2 3 
4  3 3 2 5 2 3 1 1 1 
5  3 3 2 4 4 4 1 3 2 
6  1   2 1   3 2 3 4 
7  3 2 1 4 2 2 4   4 
8  3 5 4 3 1 1 3 4 4 
9  1 1 2 5 4 2 4 1 3 
10 1 3 5 5 1 1 2 4 2 
11 3       1 4 2 4 4 
12 4 5 1 5 3 2 2 3 4 
13 1 3 4 2 2 2 1 3 1 
14 3 5 1 2 3   1 4 4 
15 5 2 5 3 1 1 1 1 2 
16         2 2 3 4 1 
17 4 4 5 1 2 1   1 4 
18 4 1 2 2 1 1 4 1 3 
19 5 5 1 2 2 1 1 3 3 
20 2 4 4 5 3 3 3 2 3 
run;
proc means data=HAVE(keep=A: B:) noprint; 
  output out=MEANS(drop=_:) mean=/autoname;
run;   
data WANT;
  set HAVE ;
  retain P 1;
  set MEANS point=P;
  array A [*] A1-A4 ;
  array B [*] B1-B5 ;
  keep ID A1-A4 B1-B5 ;
  if 0 < nmiss(of A[*]) < n(of A[*]) then do I=1 to dim(A);
    A[I]= coalesce( A[I], vvaluex(cats('A',I,'_MEAN' )));
  end;
  if 0 < nmiss(of B[*]) < n(of B[*]) then do I=1 to dim(B);
    B[I]= coalesce( B[I], vvaluex(cats('B',I,'_MEAN' )));
  end;
run;

 

Function whichn() cannot find a missing value, so we have to loop through the whole array to replace values sadly.

 

 

 

id a1 a2 a3 a4 b1 b2 b3 b4 b5
1 1 5 4 1 1 3 2 3 4
2 4 1 3 5 1 4 3 3 1
3 2 3 2.82 3 3 3 3 2 3
4 3 3 2 5 2 3 1 1 1
5 3 3 2 4 4 4 1 3 2
6 1 3.24 2 1 2.05 3 2 3 4
7 3 2 1 4 2 2 4 2.63 4
8 3 5 4 3 1 1 3 4 4
9 1 1 2 5 4 2 4 1 3
10 1 3 5 5 1 1 2 4 2
11 3 . . . 1 4 2 4 4
12 4 5 1 5 3 2 2 3 4

View solution in original post


All Replies
Super Contributor
Posts: 254

Re: Conditionally replace missing with means

 

IF avar_all[i]=. THEN avar_all[i]=mean(???);

 

Use  mean(of avar_all[*]);

 

 

IF bvar_all[i]=. THEN bvar_all[i]=mean(???);

Use  mean(of bvar_all[*]);

 

Frequent Contributor
Posts: 93

Re: Conditionally replace missing with means

Thank you!

 

mean(of avar_all[*]) gives the mean of across-variable, a1-a4. However, I need to replace the missing with the mean of other values of the same variable (just like proc standard does).

 

For example, a3 value for id#3 should be replaced by the mean of a3, 2.82.

Super Contributor
Posts: 254

Re: Conditionally replace missing with means

I don't see how you are getting 2.82

Give the individual values as how you are deriving it.

Solution
‎08-17-2016 09:23 AM
PROC Star
Posts: 1,561

Re: Conditionally replace missing with means

[ Edited ]

Like this? [Edited as I wasn't computing a column average]

 

 

data HAVE;
input ID $3. (A1 A2 A3 A4 B1 B2 B3 B4 B5) (2.); 
cards;
1  1 5 4 1 1 3 2 3 4 
2  4 1 3 5 1 4 3 3 1 
3  2 3   3 3 3 3 2 3 
4  3 3 2 5 2 3 1 1 1 
5  3 3 2 4 4 4 1 3 2 
6  1   2 1   3 2 3 4 
7  3 2 1 4 2 2 4   4 
8  3 5 4 3 1 1 3 4 4 
9  1 1 2 5 4 2 4 1 3 
10 1 3 5 5 1 1 2 4 2 
11 3       1 4 2 4 4 
12 4 5 1 5 3 2 2 3 4 
13 1 3 4 2 2 2 1 3 1 
14 3 5 1 2 3   1 4 4 
15 5 2 5 3 1 1 1 1 2 
16         2 2 3 4 1 
17 4 4 5 1 2 1   1 4 
18 4 1 2 2 1 1 4 1 3 
19 5 5 1 2 2 1 1 3 3 
20 2 4 4 5 3 3 3 2 3 
run;
proc means data=HAVE(keep=A: B:) noprint; 
  output out=MEANS(drop=_:) mean=/autoname;
run;   
data WANT;
  set HAVE ;
  retain P 1;
  set MEANS point=P;
  array A [*] A1-A4 ;
  array B [*] B1-B5 ;
  keep ID A1-A4 B1-B5 ;
  if 0 < nmiss(of A[*]) < n(of A[*]) then do I=1 to dim(A);
    A[I]= coalesce( A[I], vvaluex(cats('A',I,'_MEAN' )));
  end;
  if 0 < nmiss(of B[*]) < n(of B[*]) then do I=1 to dim(B);
    B[I]= coalesce( B[I], vvaluex(cats('B',I,'_MEAN' )));
  end;
run;

 

Function whichn() cannot find a missing value, so we have to loop through the whole array to replace values sadly.

 

 

 

id a1 a2 a3 a4 b1 b2 b3 b4 b5
1 1 5 4 1 1 3 2 3 4
2 4 1 3 5 1 4 3 3 1
3 2 3 2.82 3 3 3 3 2 3
4 3 3 2 5 2 3 1 1 1
5 3 3 2 4 4 4 1 3 2
6 1 3.24 2 1 2.05 3 2 3 4
7 3 2 1 4 2 2 4 2.63 4
8 3 5 4 3 1 1 3 4 4
9 1 1 2 5 4 2 4 1 3
10 1 3 5 5 1 1 2 4 2
11 3 . . . 1 4 2 4 4
12 4 5 1 5 3 2 2 3 4
Super User
Posts: 17,801

Re: Conditionally replace missing with means

@liziwu I'm curious whats the logic between looking at the number of missing row wise, but then replacing the value column wise.

Frequent Contributor
Posts: 93

Re: Conditionally replace missing with means

Thanks, Reeza.
It is a survey data I'm working on. For each measurement, there are multiple items. For example, a1-a4 is to measure individuals' mental health, and b1-b5 is to measure physical health. The sum of a1-a4 is the final score for a person. Therefore, if one person only completed one item out of four, the person cannot be counted as a valid observation.
Super User
Posts: 17,801

Re: Conditionally replace missing with means

Do multiple steps. 

First idenrify the variables to be standardized by getting the number of missing per column. Create a list of the variable names from that and pass it to proc standardize. Here's a sketch (untested). 

 

Proc means data =have n nmiss stackods;

var a1-a4 b1-b5;

ODS output summary=stats;

run;

 

data var_list;

set stats end=EEof;

length var_list $250.;

retain var_list;

 

if nmiss/n > 0.5 the do;

var_list = catx(" ", var_list, name);

end;

 

if eeof then call symputx('var_list', var_list);

 

run;

 

proc standardize...

var &var_list;

...

run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 338 views
  • 1 like
  • 4 in conversation