My data is in this format:
data have;
input GROUP1 GROUP2 GROUP3 VAR1 VAR2;
cards;
A A1 A11 10 12
A A1 A11 13 .
A A1 A12 15 16
A A2 A12 . 17
A A2 A13 12 12
A A2 A13 19 18
B B1 B11 15 21
B B1 B11 22 29
B B1 B12 31 40
B B2 B12 . .
B B2 B13 5 3
B B2 B13 . .
;
RUN;
And I want the result in this format:
data desired;
input GROUP1 GROUP2 GROUP3 VAR1 VAR2;
cards;
A A1 A11 10 12
A A1 A11 13 12
A A1 A12 15 16
A A2 A12 15 17
A A2 A13 12 12
A A2 A13 19 18
B B1 B11 15 21
B B1 B11 22 29
B B1 B12 31 40
B B2 B12 31 31
B B2 B13 5 3
B B2 B13 5 3
;
RUN;
I want to impute the missing values by replicating the previous value from the same above group. E.g. for group combinations (B, B2, B13), there's a missing value for variable "Value2". This value will be replicated with 20, which is the above value from the same group.
I tried to follow this option. But this does not have multiple value computation option.
data imputedData;
set mydata;
n=_n_;
if missing(Value1) then
do;
do until (not missing(value1));
n=n-1;
set mydata(keep=Value1) point=n; *second SET statement;
end;
end;
run;
But the problem is it can calculate only 1 variable at a time. And also it was taking a long time to compute. My dataset is around 400K number of rows.
If someone please guide me how to do it.
Thanks!
data have; input Var1 $ Var2 $ Var3 $ Value1 Imputer_Value1 Value2 Imputer_Value2; cards; A A1 A11 6 6 15 15 A A1 A11 9 9 14 14 A A1 A12 1 1 19 19 A A2 A12 1 16 16 A A2 A13 10 10 13 13 A A2 A13 4 4 . 13 B B1 B11 8 8 13 13 B B1 B11 9 9 17 17 B B1 B12 5 5 18 18 B B2 B12 . 5 12 12 B B2 B13 2 2 20 20 B B2 B13 1 1 . 20 ; run; data want; update have(obs=0) have; by var1 var2 var3; output; run; <.pre>
Please post test data in the form of a datastep - use the {i} to post code! This is so we can run the datastep and have soe data to write code on, not here to type in test data or try to read unformatted text. Also post example of what you want out.
data have; input Var1 $ Var2 $ Var3 $ Value1 Imputer_Value1 Value2 Imputer_Value2; cards; A A1 A11 6 6 15 15 A A1 A11 9 9 14 14 A A1 A12 1 1 19 19 A A2 A12 1 16 16 A A2 A13 10 10 13 13 A A2 A13 4 4 . 13 B B1 B11 8 8 13 13 B B1 B11 9 9 17 17 B B1 B12 5 5 18 18 B B2 B12 . 5 12 12 B B2 B13 2 2 20 20 B B2 B13 1 1 . 20 ; run; data want; update have(obs=0) have; by var1 var2 var3; output; run; <.pre>
This is a good approach. It fills in for all variables, and you don't even need to know the variable names.
Looking at your data on lines 3 and 4, it looks like you want to fill in values within GROUP1 regardless of whether GROUP2 or GROUP3 changes. So your BY statement might be shorter:
by group1;
Sorry, when I said "good approach" I was talking about KSharp's solution.
The question I was trying to address is this. When should the previous value be carried forward? When GROUP1 changes from "A" to "B", is it still OK to carry a value forward from an "A" observation to a "B" observation? (I suspect the answer is no.)
How about when GROUP1 remains "A", but GROUP2 changes from "A1" to "A2"? Is it permissible to carry forward a value from "A1" to "A2" observations? I suspect the answer is yes, because in your original example you do exactly that. (Refer to lines 3 and 4 of the data.)
So if my suspected answers are correct, and if your data is already sorted as indicated, the program becomes:
data ImputedData;
update mydata (obs=0) mydata;
by group1;
output;
run;
@Astounding: Thanks for the clarification!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.