BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bigbigben
Obsidian | Level 7

Suppose I have three variables: Y, X1 and X2, while both Y and X1 have 100 observations, but X2 only has, say, 30 observations.

I want to estimate an equatino as Y=X1*b1+X2*b2, while utilizing all the information I have, i.e., I do not want to discard the 70 observations with missing X2s. How am I supposed to write the code?

Can I write it in this way:

prco model data=yx1x2;

     parameters b1 b2;

    if x2=. then

   eq1=y-x1*b1;

else

   eq1=y-x1*b1-x2*b2;

fit eq1;

run;

Behind the scene, how does SAS process this equation, I mean, what is the algorithm ?

Thank you very much.

1 ACCEPTED SOLUTION

Accepted Solutions
kessler
SAS Employee

Using the above code PROC MODEL will effectively estimate a two variable linear model with no intercept term using OLS where all the missing values of X2 have been imputed to be zero.

View solution in original post

6 REPLIES 6
Reeza
Super User

1. Can you impute your missing data? SAS has procedures for that.

2. Is your data missing at random or systematic and continuous or categorical? If categorical, can you include "Missing" as a category?

bigbigben
Obsidian | Level 7

My constraint is that I cannot impute the missing values. Let's suppose the missing is random and the variable is numeric. I actually can get the code run, if I have the specification I mentioned in the inital post, but I am not sure whether the results are reliable.

kessler
SAS Employee

Using the above code PROC MODEL will effectively estimate a two variable linear model with no intercept term using OLS where all the missing values of X2 have been imputed to be zero.

bigbigben
Obsidian | Level 7

Thanks, kessler. You are right. That is how SAS does it behind the scene.

PGStats
Opal | Level 21

If you want to capture the fact that the average of missing x2 values might not be zero, you could try fitting your model this way :

proc model data=yx1x2;

parameters b0 b1 b2 bz;

z = missing(x2);

if z then x2=0;

y = b0 + x1*b1 + x2*b2 + z*bz;

fit y;

run;

Parameter b0 will account for the overall intercept (you may remove it later if it is not significant) and bz will account for the average effect of missing x2 values.

PG

PG
bigbigben
Obsidian | Level 7

Thanks, PG. That is a good way to work around the issue.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Discussion stats
  • 6 replies
  • 3035 views
  • 9 likes
  • 4 in conversation