Hello,
I have a specific set of values which I need to impute upon that are dependent on other variables.
for example, I have a variable that is missing for subjects (lets call it var1), and they have values of var2, var3, var4, var5, var6, var7 and var8 present. I want to impute the mean of those 7 variables((aka var1+var2+.../n where n = the number of variables present)) into var1 when ONLY var1 is missing. I want to do this for each other variable as well, when all the other variables are present (so if ONLY var2 is missing, or ONLY var3 is missing).
then, I want to do the same when there are 2 missing but 6 are present, for all combinations (var1, var2 missing, rest present and so on, using the mean as described above but using 6), and then doing the same for when there are combinations of three missing variables.
is there a way to do this using a do loop? I assume there must be, but I don't know how to do it. If anyone can assist, I would appreciate it!
Hi @zfusfeld ,
do you mean something like this:
data have;
input id var1 var2 var3 var4 var5 var6 var7 var8;
cards;
01 . 2 3 4 5 6 7 8
02 1 . 3 4 5 6 7 8
03 1 2 . 4 5 6 7 8
04 1 2 3 . 5 6 7 8
05 1 2 3 4 . 6 7 8
06 1 2 3 4 5 . 7 8
07 1 2 3 4 5 6 . 8
08 1 2 3 4 5 6 7 .
11 . . 3 4 5 6 7 8
12 1 . . 4 5 6 7 8
13 1 2 . . 5 6 7 8
14 1 2 3 . . 6 7 8
15 1 2 3 4 . . 7 8
16 1 2 3 4 5 . . 8
17 1 2 3 4 5 6 . .
18 . 2 . 4 5 6 7 8
19 1 . 3 . 5 6 7 8
20 1 . 3 . 5 . 7 .
;
run;
data want;
set have;
array v var:;
if nmiss(of v[*]) < 4 then
do;
mean = mean(of v[*]);
do over v;
v = coalesce(v,mean);
end;
end;
run;
proc print;
run;
All the best
Bart
Can you post a sample table using datalines for us to use?
id | var1 | var2 | var3 | var4 | var5 | var6 | var7 | var8 | |
1 | 1 | 3 | 3 | 1 | 1 | 3 | 2 | 3 | |
2 | 3 | 1 | 3 | 4 | 1 | 3 | 3 | 3 | |
3 | 2 | 3 | 3 | 3 | 3 | 3 | 2 | ||
4 | 3 | 3 | 2 | 4 | 2 | 3 | 1 | 1 | |
5 | 3 | 3 | 2 | 4 | 4 | 4 | 1 | 3 | |
6 | 1 | 2 | 1 | 3 | 2 | 3 | |||
7 | 3 | 2 | 1 | 4 | 2 | 2 | 4 | ||
8 | 3 | 4 | 4 | 3 | 1 | 1 | 3 | 4 | |
9 | 1 | 1 | 2 | 4 | 4 | 2 | 4 | 1 | |
10 | 1 | 3 | 4 | 4 | 1 | 1 | 2 | 4 | |
11 | 3 | 1 | 4 | 2 | 4 | ||||
12 | 4 | 4 | 1 | 4 | 3 | 2 | 2 | 3 | |
13 | 1 | 3 | 4 | 2 | 2 | 2 | 1 | 3 | |
14 | 3 | 4 | 1 | 2 | 3 | 1 | 4 | ||
15 | 4 | 2 | 4 | 3 | 1 | 1 | 1 | 1 | |
16 | 2 | 2 | 3 | 4 | |||||
17 | 4 | 4 | 4 | 1 | 2 | 1 | 1 | ||
18 | 4 | 1 | 2 | 2 | 1 | 1 | 4 | 1 | |
19 | 4 | 4 | 1 | 2 | 2 | 1 | 1 | 3 | |
20 | 2 | 4 | 4 | 4 | 3 | 3 | 3 | 2 |
so, ignore the extra line of this table, I'm not sure why it's like that.
but basically, in this example, I would want to impute means on any individual who had greater than 4 responses across all variables (so not for someone like 16, but for 17, 11, 6, etc.
what i want to impute is the mean of the other scores available for each subject, so for example for subject 17, for var7 I would impute: (4+4+4+1+2+1+1)/7, and would do that for everyone missing only var7 (or only var1, or only var2, etc). for someone missing only 2 variables, like in subject 6, I would impute the average score of that person over the 6 variables that are present, and would do the same for those missing only 3 variables.
does that clarify?
oh sorry i just realized this wasnt with datalines. hold on.
data HAVE;
input ID $3. (var1 var2 var3 var4 var5 var6 var7 var8) (2.);
cards;
1 1 4 4 1 1 3 2 3
2 4 1 3 4 1 4 3 3
3 2 3 3 3 3 3 2
4 3 3 2 4 2 3 1 1
5 3 3 2 4 4 4 1 3
6 1 2 1 3 2 3
7 3 2 1 4 2 2 4
8 3 4 4 3 1 1 3 4
9 1 1 2 4 4 2 4 1
10 1 3 4 4 1 1 2 4
11 3 1 4 2 4
12 4 4 1 4 3 2 2 3
13 1 3 4 2 2 2 1 3
14 3 4 1 2 3 1 4
15 4 2 4 3 1 1 1 1
16 2 2 3 4
17 4 4 4 1 2 1 1
18 4 1 2 2 1 1 4 1
19 4 4 1 2 2 1 1 3
20 2 4 4 4 3 3 3 2
run;
Hi @zfusfeld ,
do you mean something like this:
data have;
input id var1 var2 var3 var4 var5 var6 var7 var8;
cards;
01 . 2 3 4 5 6 7 8
02 1 . 3 4 5 6 7 8
03 1 2 . 4 5 6 7 8
04 1 2 3 . 5 6 7 8
05 1 2 3 4 . 6 7 8
06 1 2 3 4 5 . 7 8
07 1 2 3 4 5 6 . 8
08 1 2 3 4 5 6 7 .
11 . . 3 4 5 6 7 8
12 1 . . 4 5 6 7 8
13 1 2 . . 5 6 7 8
14 1 2 3 . . 6 7 8
15 1 2 3 4 . . 7 8
16 1 2 3 4 5 . . 8
17 1 2 3 4 5 6 . .
18 . 2 . 4 5 6 7 8
19 1 . 3 . 5 6 7 8
20 1 . 3 . 5 . 7 .
;
run;
data want;
set have;
array v var:;
if nmiss(of v[*]) < 4 then
do;
mean = mean(of v[*]);
do over v;
v = coalesce(v,mean);
end;
end;
run;
proc print;
run;
All the best
Bart
Very nice way to do it. I was going to write an IF-THEN inside a loop. Yours is much cleaner.
- Peter
your solution is incredibly elegant and worked absolutely BEAUTIFULLY. Thank you very much!!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.