Suppose I have three variables: Y, X1 and X2, while both Y and X1 have 100 observations, but X2 only has, say, 30 observations.
I want to estimate an equatino as Y=X1*b1+X2*b2, while utilizing all the information I have, i.e., I do not want to discard the 70 observations with missing X2s. How am I supposed to write the code?
Can I write it in this way:
prco model data=yx1x2;
parameters b1 b2;
if x2=. then
Behind the scene, how does SAS process this equation, I mean, what is the algorithm ?
Thank you very much.
My constraint is that I cannot impute the missing values. Let's suppose the missing is random and the variable is numeric. I actually can get the code run, if I have the specification I mentioned in the inital post, but I am not sure whether the results are reliable.
If you want to capture the fact that the average of missing x2 values might not be zero, you could try fitting your model this way :
proc model data=yx1x2;
parameters b0 b1 b2 bz;
z = missing(x2);
if z then x2=0;
y = b0 + x1*b1 + x2*b2 + z*bz;
Parameter b0 will account for the overall intercept (you may remove it later if it is not significant) and bz will account for the average effect of missing x2 values.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.