Hi guys,
I'm trying to impute outcome 2 using regression method for a single imputation. I know that I have to run regression model like show in the code to get the coefficients to build my model. Do you know how to convert current regression model to a loop? so that values regressed are imputed in the missing slots?
Thanks a lot for your help in advance.
data support;
input id treatment gender age duration baseline outcome2 outcome4;
cards;
1 1 1 32.59 7.589041 4 -3 -3
2 1 2 37.51 13.50959 5 -1 -1
3 1 1 52.87 27.86849 6 . .
4 1 1 34.35 6.347945 3 -3 -3
5 1 2 30.13 5.131507 5 -4 -4
6 1 2 30.12 7.115068 5 . .
7 1 2 32.75 7.753425 9 -3 -3
8 1 1 30.9 7.89863 3 . .
9 1 1 31.09 6.087671 6 -6 -6
10 1 2 30.61 3.605479 4 -3.5 -3.5
11 1 1 28.93 3.926027 5 -3 -3
12 1 2 34.1 5.10137 3 -1 -1
13 0 1 30.33 2.334247 2 -1 -1
14 0 2 32.5 5.504109 5 -1 -1
15 0 2 32.27 8.273973 5 . .
16 0 2 36.73 11.73151 1 1 1
17 0 2 61.06 41.06301 8
;
proc reg data=support;
model outcome2 outcome4=treatment age gender duration baseline;
run;
As PaigeMiller pointed out, I think you are using the wrong terminology. Outcome2 is a response variable, therefore you do not "impute" the values, you "predict" them by scoring the mode. For your example, the output data set contains predicted values for the response variables:
proc reg data=support plots=none;
model outcome2 outcome4=treatment age gender duration baseline;
output out=RegOut P=Pred2 Pred4;
quit;
proc print data=RegOut;
var ID OutCome2 Pred2 Outcome4 Pred4;
run;
If this is not what you want, please explain further.
A lot of deciding how to handle missing values depends on understanding the subject matter and understanding the goals of the analysis, none of which we know, and so the best way to handle the missings is usually up to you.
In your case, the missing values are the Y variables in the regression, and generally those are not imputed (normally you would only impute values for the x-variables when missing) and so these observations would not be used in the regression. But even so, if you want values for the Y variables, then see paragraph 1.
Can I use outcome2 to predict outcome 4?
No. Just look at the data. There is no way that outcome2 can be used to impute values to replace the missings of outcome4.
As PaigeMiller pointed out, I think you are using the wrong terminology. Outcome2 is a response variable, therefore you do not "impute" the values, you "predict" them by scoring the mode. For your example, the output data set contains predicted values for the response variables:
proc reg data=support plots=none;
model outcome2 outcome4=treatment age gender duration baseline;
output out=RegOut P=Pred2 Pred4;
quit;
proc print data=RegOut;
var ID OutCome2 Pred2 Outcome4 Pred4;
run;
If this is not what you want, please explain further.
Actually, it did not occur to me that perhaps the question is how to predict outcome2 and outcome4 in this situation.
@Cruise, is that what you want, predictions of outcome2 and outcome4 based upon the fitted model (which would only use observations with no missing values)?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.