This is my first time posting to the SAS community so thank you all for your help! I have a more statistical theory question. In a linear regression model, would it be statistically sound to use the average of the dependent as an independent variable? Specifically, I am building a model to predict the number of days until project completion. I am interested in creating a variable named “average days” which is the average amount of time to complete a project by zip code. For the aggregation, a given data point’s project time would not contribute to the average for that particular observation, but would be used for other data points in the same zip code. The new variable “average days” is not strongly correlated with other variables in the model (strongest correlation is 0.4 with one other variable that determines the likelihood that a permit will be needed for the job). I attached a sample of the code if interested. Assuming there is no multicollinearity and/or model overfitting, would there be statistical or mathematical concerns with this method? If so, could you explain why?
... View more