BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
newbie_ari
Fluorite | Level 6

My data is in this format:

data have;
input GROUP1 GROUP2 GROUP3 VAR1 VAR2;
cards;
A A1 A11 10 12
A A1 A11 13 .
A A1 A12 15 16
A A2 A12 . 17
A A2 A13 12 12
A A2 A13 19 18
B B1 B11 15 21
B B1 B11 22 29
B B1 B12 31 40
B B2 B12 . .
B B2 B13 5 3
B B2 B13 . .
;
RUN;

And I want the result in this format:

data desired;
input GROUP1 GROUP2 GROUP3 VAR1 VAR2;
cards;
A A1 A11 10 12
A A1 A11 13 12
A A1 A12 15 16
A A2 A12 15 17
A A2 A13 12 12
A A2 A13 19 18
B B1 B11 15 21
B B1 B11 22 29
B B1 B12 31 40
B B2 B12 31 31
B B2 B13 5 3
B B2 B13 5 3
;
RUN;


I want to impute the missing values by replicating the previous value from the same above group. E.g. for group combinations (B, B2, B13), there's a missing value for variable "Value2". This value will be replicated with 20, which is the above value from the same group.
I tried to follow this option. But this does not have multiple value computation option.

data imputedData;
set mydata;
n=_n_;
if missing(Value1) then
do;
do until (not missing(value1));
n=n-1;
set mydata(keep=Value1) point=n; *second SET statement;
end;
end;
run;

But the problem is it can calculate only 1 variable at a time. And also it was taking a long time to compute. My dataset is around 400K number of rows.

If someone please guide me how to do it.
Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

data have;
input Var1 $ Var2 $ Var3 $ Value1 Imputer_Value1 Value2 Imputer_Value2;
cards;
A A1 A11 6 6 15 15
A A1 A11 9 9 14 14
A A1 A12 1 1 19 19
A A2 A12 1 16 16
A A2 A13 10 10 13 13
A A2 A13 4 4 . 13
B B1 B11 8 8 13 13
B B1 B11 9 9 17 17
B B1 B12 5 5 18 18
B B2 B12 . 5 12 12
B B2 B13 2 2 20 20
B B2 B13 1 1 . 20
;
run;
data want;
 update have(obs=0) have;
 by var1 var2 var3;
 output;
run;
<.pre>

View solution in original post

6 REPLIES 6
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please post test data in the form of a datastep - use the {i} to post code!  This is so we can run the datastep and have soe data to write code on, not here to type in test data or try to read unformatted text.  Also post example of what you want out.  

Ksharp
Super User

data have;
input Var1 $ Var2 $ Var3 $ Value1 Imputer_Value1 Value2 Imputer_Value2;
cards;
A A1 A11 6 6 15 15
A A1 A11 9 9 14 14
A A1 A12 1 1 19 19
A A2 A12 1 16 16
A A2 A13 10 10 13 13
A A2 A13 4 4 . 13
B B1 B11 8 8 13 13
B B1 B11 9 9 17 17
B B1 B12 5 5 18 18
B B2 B12 . 5 12 12
B B2 B13 2 2 20 20
B B2 B13 1 1 . 20
;
run;
data want;
 update have(obs=0) have;
 by var1 var2 var3;
 output;
run;
<.pre>
Astounding
PROC Star

This is a good approach.  It fills in for all variables, and  you don't even need to know the variable names.

 

Looking at your data on lines 3 and 4, it looks like you want to fill in values within GROUP1 regardless of whether GROUP2 or GROUP3 changes.  So your BY statement might be shorter:

 

by group1;

newbie_ari
Fluorite | Level 6
@Astounding: I didn't get your answer. My lowest level is Group3. So you are saying I should use by statement in my code? The code I used works fine for small dataset. But the dataset of my size it's takes over 3 hours. Also, it calculates value I variable at a time. So the entire process is extremely time consuming. Is there any other alternative?
Astounding
PROC Star

Sorry, when I said "good approach" I was talking about KSharp's solution.

 

The question I was trying to address is this.  When should the previous value be carried forward?  When GROUP1 changes from "A" to "B", is it still OK to carry a value forward from an "A" observation to a "B" observation?  (I suspect the answer is no.)

 

How about when GROUP1 remains "A", but GROUP2 changes from "A1" to "A2"?  Is it permissible to carry forward a value from "A1" to "A2" observations?  I suspect the answer is yes, because in your original example you do exactly that.  (Refer to lines 3 and 4 of the data.)

 

So if my suspected answers are correct, and if your data is already sorted as indicated, the program becomes:

 

data ImputedData;

update mydata (obs=0) mydata;

by group1;

output;

run;

newbie_ari
Fluorite | Level 6

@Astounding: Thanks for the clarification! 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 905 views
  • 3 likes
  • 4 in conversation