I have created an elastic net regression on GLMSELECT and while i get the parameter estimates I can't really do anything with them after the fact.
Have tried PROC SCORE and PROC PLM to no avail. I need to evaluate the second data set so I can obtain the response SalePrice
should I Score inside GLMSELECT or with another process?
(yes many prints in between to ensure steps initially)
Currently my output is also displaying the variables in the model but not doing anything with them in PLM
options ls=78; data college; infile "C:\Users\Matilde Ruiz\OneDrive\Documents\STAT 580\HousesClean.csv" delimiter=',' firstobs=2 missover dsd; length RoofMatl $ 10. Neighborhood $ 9.; input ID SalePrice Neighborhood $ LotFrontage LotArea LotShape $ LotConfig $ Utilities $ BldgType $ HouseStyle $ OverallQual OverallCond YearBuilt RoofStyle $ RoofMatl $ Exterior1st $ ExterQual $ ExterCond $ Foundation $ BsmtQual $ BsmtCond $ BsmtFinType1 $ BsmtFinSF1 BsmtUnfSF Heating $ HeatingQC $ CentralAir $ Electrical $ GrLivArea FullBath HalfBath BedroomAbvGr Kitchen KitchenQu $ TotRmsAbvGr Fireplaces GarageType $ PavedDrive $ WoodDecSF OpenProchSF YrSold SaleType $; run; /* proc print data=college; run; */ data coded; set college; if RoofStyle="Gable" then roofS=0; if RoofStyle="notGable" then roofS=1; if RoofMatl="CompShg" then roofM=0; if RoofMatl="notCompShg" then roofM=1; if ExterQual="Gd" then extQ=0; if ExterQual="TA" then extQ=1; if ExterCond="Gd" then extC=0; if ExterCond="TA" then extC=1; if BsmtCond="TA" then bsmtC=0; if BsmtCond="NA" then bsmtC=1; if CentralAir="N" then AC=0; if CentralAir="Y" then AC=1; if Electrical="SBrkr" then elect=0; if Electrical="Fuse" then elect=1; if SaleType="WD" then deed=0; if SaleType="notWD" then deed=1; drop RoofStyle RoofMatl ExterQual ExterCond BsmtCond CentralAir Electrical SaleType; run; /* proc print data=coded; run; */ ods graphics on; proc surveyselect data=coded out=trainset seed=123 samprate=0.8 method=srs outall; run; /* proc print data=trainset; run; */ data collegetest; infile "C:\Users\Matilde Ruiz\OneDrive\Documents\STAT 580\HousesTest.csv" delimiter=',' firstobs=2 missover dsd; length RoofMatl $ 10. Neighborhood $ 9.; input ID $ Neighborhood $ LotFrontage LotArea LotShape $ LotConfig $ Utilities $ BldgType $ HouseStyle $ OverallQual OverallCond YearBuilt RoofStyle $ RoofMatl $ Exterior1st $ ExterQual $ ExterCond $ Foundation $ BsmtQual $ BsmtCond $ BsmtFinType1 $ BsmtFinSF1 BsmtUnfSF Heating $ HeatingQC $ CentralAir $ Electrical $ GrLivArea FullBath HalfBath BedroomAbvGr Kitchen KitchenQu $ TotRmsAbvGr Fireplaces GarageType $ PavedDrive $ WoodDecSF OpenProchSF YrSold SaleType $; run; data codedtest; set college; if RoofStyle="Gable" then roofS=0; if RoofStyle="notGable" then roofS=1; if RoofMatl="CompShg" then roofM=0; if RoofMatl="notCompShg" then roofM=1; if ExterQual="Gd" then extQ=0; if ExterQual="TA" then extQ=1; if ExterCond="Gd" then extC=0; if ExterCond="TA" then extC=1; if BsmtCond="TA" then bsmtC=0; if BsmtCond="NA" then bsmtC=1; if CentralAir="N" then AC=0; if CentralAir="Y" then AC=1; if Electrical="SBrkr" then elect=0; if Electrical="Fuse" then elect=1; if SaleType="WD" then deed=0; if SaleType="notWD" then deed=1; drop RoofStyle RoofMatl ExterQual ExterCond BsmtCond CentralAir Electrical SaleType; run; proc glmselect data=trainset plots=all seed=123 valdata=codedtest; partition role=selected(test='0' train='1'); class Neighborhood LotShape LotConfig BldgType HouseStyle Exterior1st Foundation BsmtQual BsmtFinType1 Heating HeatingQC KitchenQu GarageType PavedDrive roofS roofM extQ extC bsmtC AC elect deed; model SalePrice= Neighborhood LotFrontage LotArea LotShape LotConfig BldgType HouseStyle OverallQual OverallCond YearBuilt Exterior1st Foundation BsmtQual BsmtFinType1 BsmtFinSF1 BsmtUnfSF Heating HeatingQC GrLivArea FullBath HalfBath BedroomAbvGr KitchenQu TotRmsAbvGr Fireplaces GarageType PavedDrive WoodDecSF OpenProchSF YrSold roofS roofM extQ extC bsmtC AC elect deed /selection=elasticnet (steps=120 choose=cv) cvmethod=random(10); store parestimates; quit; proc plm restore=parestimates; score data=codedtest out=pred; run;
The predicted values should be in the PREDICTIONS data set in a variable named p_SalePrice.
Since I do not have the data you are using, here is an example that uses Sashelp.cars. The syntax is the same as yours. The call to PROC SGPLOT creates a histogram of the p_MPG_City variable (= the predicted values) to demonstrate that the SCORE statement is generating the predicted values.
data trainset codedtest;
call streaminit(12345);
set sashelp.cars;
if rand("Bernoulli", 0.3) then
output codedtest;
else do;
selected = rand("Bernoulli", 0.6);
output trainset;
end;
keep Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City Weight Wheelbase Length selected;
run;
proc glmselect data=trainset plots=none seed=123 valdata=codedtest;
partition role=selected(test='0' train='1');
class Type Origin DriveTrain;
model MPG_City = MSRP Invoice EngineSize Cylinders Horsepower Weight Wheelbase Length
/selection=elasticnet (steps=120 choose=cv) cvmethod=random(10);
score data=codedtest predicted out=predictions;
quit;
proc sgplot data=predictions;
histogram p_MPG_City;
run;
If your output does not have the p_SalePrice variable, then there might be something wrong with your input data set. For example, perhaps it does not contain all of the variables in the model or it has all missing values for one or more explanatory variables.
You can use the SCORE statement in PROC GLMSELECT to create an output data set that contains the predicted (and residual) values:
SAS Help Center: SCORE Statement
Thank you, tried this as well but it is not really giving me the predicted output.
This is the change I had made, the second dataset contains 67 observations but it is scoring 246 so my sense is that is re-scoring the original one. does the score go before the model statement?
proc glmselect data=trainset plots=all seed=123 valdata=codedtest;
partition role=selected(test='0' train='1');
class Neighborhood LotShape LotConfig BldgType HouseStyle Exterior1st Foundation BsmtQual BsmtFinType1 Heating HeatingQC
KitchenQu GarageType PavedDrive roofS roofM extQ extC bsmtC AC elect deed;
model SalePrice= Neighborhood LotFrontage LotArea LotShape LotConfig BldgType HouseStyle OverallQual OverallCond YearBuilt Exterior1st
Foundation BsmtQual BsmtFinType1 BsmtFinSF1 BsmtUnfSF Heating HeatingQC GrLivArea FullBath HalfBath
BedroomAbvGr KitchenQu TotRmsAbvGr Fireplaces GarageType PavedDrive WoodDecSF OpenProchSF YrSold roofS roofM extQ extC bsmtC AC elect deed
/selection=elasticnet (steps=120 choose=cv) cvmethod=random(10);
score data=codedtest predicted out=predictions;
quit;
Found the error for the test, I was doing set college instead of college test on the test file before coding.
Once I fixed that still not scoring but at least reading the right input- no predicted values
The predicted values should be in the PREDICTIONS data set in a variable named p_SalePrice.
Since I do not have the data you are using, here is an example that uses Sashelp.cars. The syntax is the same as yours. The call to PROC SGPLOT creates a histogram of the p_MPG_City variable (= the predicted values) to demonstrate that the SCORE statement is generating the predicted values.
data trainset codedtest;
call streaminit(12345);
set sashelp.cars;
if rand("Bernoulli", 0.3) then
output codedtest;
else do;
selected = rand("Bernoulli", 0.6);
output trainset;
end;
keep Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City Weight Wheelbase Length selected;
run;
proc glmselect data=trainset plots=none seed=123 valdata=codedtest;
partition role=selected(test='0' train='1');
class Type Origin DriveTrain;
model MPG_City = MSRP Invoice EngineSize Cylinders Horsepower Weight Wheelbase Length
/selection=elasticnet (steps=120 choose=cv) cvmethod=random(10);
score data=codedtest predicted out=predictions;
quit;
proc sgplot data=predictions;
histogram p_MPG_City;
run;
If your output does not have the p_SalePrice variable, then there might be something wrong with your input data set. For example, perhaps it does not contain all of the variables in the model or it has all missing values for one or more explanatory variables.
thank you so much!
With the plot I was able to see that it was actually computing my sale price and then I just added a Proc Print and it returned the table with the variables as well as the predicted value.
I am going to play some more with the variables to get some better estimates but the actual functionality is there
proc glmselect data=trainset plots=all seed=123;
partition role=selected(test='0' train='1');
class Neighborhood LotShape LotConfig BldgType HouseStyle Exterior1st Foundation BsmtQual BsmtFinType1 Heating HeatingQC
KitchenQu GarageType PavedDrive roofS roofM extQ extC bsmtC AC elect deed;
model SalePrice= Neighborhood LotFrontage LotArea LotShape LotConfig BldgType HouseStyle OverallQual OverallCond YearBuilt Exterior1st
Foundation BsmtQual BsmtFinType1 BsmtFinSF1 BsmtUnfSF Heating HeatingQC GrLivArea FullBath HalfBath
BedroomAbvGr KitchenQu TotRmsAbvGr Fireplaces GarageType PavedDrive WoodDecSF OpenProchSF YrSold roofS roofM extQ extC bsmtC AC elect deed
/selection=elasticnet (steps=120 choose=cv) cvmethod=random(10);
score data=codedtest predicted out=predictions;
run;
proc print data=predictions;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.