Solved: Re: Impute Missing Valu

newbie_ari · Posted 10-31-2017 07:42 AM

My data is in this format:

data have;
input GROUP1 GROUP2 GROUP3 VAR1 VAR2;
cards;
A A1 A11 10 12
A A1 A11 13 .
A A1 A12 15 16
A A2 A12 . 17
A A2 A13 12 12
A A2 A13 19 18
B B1 B11 15 21
B B1 B11 22 29
B B1 B12 31 40
B B2 B12 . .
B B2 B13 5 3
B B2 B13 . .
;
RUN;

And I want the result in this format:

data desired;
input GROUP1 GROUP2 GROUP3 VAR1 VAR2;
cards;
A A1 A11 10 12
A A1 A11 13 12
A A1 A12 15 16
A A2 A12 15 17
A A2 A13 12 12
A A2 A13 19 18
B B1 B11 15 21
B B1 B11 22 29
B B1 B12 31 40
B B2 B12 31 31
B B2 B13 5 3
B B2 B13 5 3
;
RUN;

I want to impute the missing values by replicating the previous value from the same above group. E.g. for group combinations (B, B2, B13), there's a missing value for variable "Value2". This value will be replicated with 20, which is the above value from the same group.
I tried to follow this option. But this does not have multiple value computation option.

data imputedData;
set mydata;
n=_n_;
if missing(Value1) then
do;
do until (not missing(value1));
n=n-1;
set mydata(keep=Value1) point=n; *second SET statement;
end;
end;
run;

But the problem is it can calculate only 1 variable at a time. And also it was taking a long time to compute. My dataset is around 400K number of rows.

If someone please guide me how to do it.
Thanks!

Ksharp · Posted 10-31-2017 08:43 AM


data have;
input Var1 $ Var2 $ Var3 $ Value1 Imputer_Value1 Value2 Imputer_Value2;
cards;
A A1 A11 6 6 15 15
A A1 A11 9 9 14 14
A A1 A12 1 1 19 19
A A2 A12 1 16 16
A A2 A13 10 10 13 13
A A2 A13 4 4 . 13
B B1 B11 8 8 13 13
B B1 B11 9 9 17 17
B B1 B12 5 5 18 18
B B2 B12 . 5 12 12
B B2 B13 2 2 20 20
B B2 B13 1 1 . 20
;
run;
data want;
 update have(obs=0) have;
 by var1 var2 var3;
 output;
run;
<.pre>

View solution in original post

RW9 · Posted 10-31-2017 08:39 AM

Please post test data in the form of a datastep - use the {i} to post code! This is so we can run the datastep and have soe data to write code on, not here to type in test data or try to read unformatted text. Also post example of what you want out.

Ksharp · Posted 10-31-2017 08:43 AM


data have;
input Var1 $ Var2 $ Var3 $ Value1 Imputer_Value1 Value2 Imputer_Value2;
cards;
A A1 A11 6 6 15 15
A A1 A11 9 9 14 14
A A1 A12 1 1 19 19
A A2 A12 1 16 16
A A2 A13 10 10 13 13
A A2 A13 4 4 . 13
B B1 B11 8 8 13 13
B B1 B11 9 9 17 17
B B1 B12 5 5 18 18
B B2 B12 . 5 12 12
B B2 B13 2 2 20 20
B B2 B13 1 1 . 20
;
run;
data want;
 update have(obs=0) have;
 by var1 var2 var3;
 output;
run;
<.pre>

Astounding · Posted 10-31-2017 09:37 AM

This is a good approach. It fills in for all variables, and you don't even need to know the variable names.

Looking at your data on lines 3 and 4, it looks like you want to fill in values within GROUP1 regardless of whether GROUP2 or GROUP3 changes. So your BY statement might be shorter:

by group1;

newbie_ari · Posted 10-31-2017 02:12 PM

@Astounding: I didn't get your answer. My lowest level is Group3. So you are saying I should use by statement in my code? The code I used works fine for small dataset. But the dataset of my size it's takes over 3 hours. Also, it calculates value I variable at a time. So the entire process is extremely time consuming. Is there any other alternative?

Astounding · Posted 10-31-2017 02:25 PM

Sorry, when I said "good approach" I was talking about KSharp's solution.

The question I was trying to address is this. When should the previous value be carried forward? When GROUP1 changes from "A" to "B", is it still OK to carry a value forward from an "A" observation to a "B" observation? (I suspect the answer is no.)

How about when GROUP1 remains "A", but GROUP2 changes from "A1" to "A2"? Is it permissible to carry forward a value from "A1" to "A2" observations? I suspect the answer is yes, because in your original example you do exactly that. (Refer to lines 3 and 4 of the data.)

So if my suspected answers are correct, and if your data is already sorted as indicated, the program becomes:

data ImputedData;

update mydata (obs=0) mydata;

by group1;

output;

run;

newbie_ari · Posted 11-07-2017 05:28 AM

@Astounding: Thanks for the clarification!

SAS Innovate 2025: Call for Content

Classroom Training Available!