Novice user here! I am trying to predict salary based on variables such as gender, jobfunction, retention, performance while accounting for the fact that people are in different salary grades which by itself will cause differences in individual salaries from grade to grade. I don't think I am doing this correctly. Can someone please help.
ods graphics on;
title "Selection Method LASSO Using Cross Validation";
proc glmselect data = train testdata=test
plots(stepAxis=number)=(criterionPanel ASEPlot CRITERIONPANEL);
model salary = grade F1 F2 F3 F4 F5 F6 F7 F8 F9 Gender
Reten Minority P2 P1 AgeCat / selection=LASSO(choose=CV stop=CV) CVDETAILS;
run;
%put &=_glsind;
proc glm data=test;
model Salary = &_glsind / solution clParm;
quit;
ods graphics off;
You say "I don't think I am doing this correctly." Is there an error? If so, please post the log.
I don't know the form of your data (you should always post some sample data, when possible), but I'm guessing that you should define some of the variables as classification variables by using the CLASS statement. Any discrete or categorical variable should be on the CLASS statement, like this:
proc glmselect data = train testdata=test;
class Gender Minority AgeCat /* any others? */;
model salary = grade F1 F2 F3 F4 F5 F6 F7 F8 F9 Gender
Reten Minority P2 P1 AgeCat / selection=LASSO(choose=CV stop=CV) CVDETAILS;
run;
You say "I don't think I am doing this correctly." Is there an error? If so, please post the log.
I don't know the form of your data (you should always post some sample data, when possible), but I'm guessing that you should define some of the variables as classification variables by using the CLASS statement. Any discrete or categorical variable should be on the CLASS statement, like this:
proc glmselect data = train testdata=test;
class Gender Minority AgeCat /* any others? */;
model salary = grade F1 F2 F3 F4 F5 F6 F7 F8 F9 Gender
Reten Minority P2 P1 AgeCat / selection=LASSO(choose=CV stop=CV) CVDETAILS;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.