Hello all,
Background
I am working with salary data pertaining to two groups of employees (group A and group B) with the purpose of identifying salary equity. One method I have been asked to employ is to run a multiple linear regression model based on group A's data and use the estimates computed from group A's model to predict salaries of each member of group B. Anyone from group B who receives a lower salary than what is predicted by group A's model is then flagged for review. However, what I want to do is flag only those members of group B whose actual salaries fall outside of the confidence limits for the expected value (mean) of the predicted value (LCLM and UCLM in prog glm).
My Issue
Since I am executing a multiple linear regression model, proc reg and proc glm are the two procedures at my disposal. I have searched for days trying to figure out how to not only predict group B's salary, but to also compute confidence limits for those predictions as well -- to no avail. I am convinced there must be a way to do this simple task, especially since it seems to be an option in proc logistic.
Here is an example of what I've tried that only produces predicted values for group B with no additional options:
proc reg data=groupA outest=GroupAModel;
Salary_P: model Salary = x1 x2 x3;
run;
proc score data=groupB score=GroupAModel type=parms predict out=work.GroupBPredict;
var x1 x2 x3;
run;
I've also tried proc glm, which will allow me to compute additional statistics for the same data set on which the regression model was run based on. I've not figured out how to do this same thing for a new data set using group A's model.
proc glm data=groupA;
model Salary =x1 x2 x3;
output out=work.GroupAPredict p=predicted_salary LCLM=LowerBound UCLM=UpperBound student=zscore residual=resid / ALPHA=0.05;
run;
Any insight into how I can use group A's model to predict group B's salary with confidence limits would be greatly appreciated!
FYI: I am using SAS Enterprise Guide 7.1.
Thank you!
D
Two methods :
1)
Add the Group B data to your data with missing values for the dependent variable. OUTPUT= dataset will include predictions and statistics for Group B data.
2)
STORE your fitted model and use proc PLM to get the predicted means and statistics.
These methods work both for proc REG and GLM.
Two methods :
1)
Add the Group B data to your data with missing values for the dependent variable. OUTPUT= dataset will include predictions and statistics for Group B data.
2)
STORE your fitted model and use proc PLM to get the predicted means and statistics.
These methods work both for proc REG and GLM.
PG,
Thank you for such a quick response and simple solution! You've made my week so much better. 🙂
D
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.