BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Mruizv
Obsidian | Level 7

I have created an elastic net regression on GLMSELECT and while i get the parameter estimates I can't really do anything with them after the fact.

Have tried PROC SCORE and PROC PLM to no avail. I need to evaluate the second data set so I can obtain the response SalePrice 

should I Score inside GLMSELECT or with another process?

(yes many prints in between to ensure steps initially)

Currently my output is also displaying the variables in the model but not doing anything with them in PLM

Mruizv_0-1713139212156.png

 

options ls=78;


 data college;
  infile "C:\Users\Matilde Ruiz\OneDrive\Documents\STAT 580\HousesClean.csv" delimiter=',' firstobs=2 missover dsd;
  length RoofMatl $ 10. Neighborhood $ 9.;
  input ID SalePrice Neighborhood $ LotFrontage LotArea LotShape $ LotConfig $ Utilities $  BldgType $ HouseStyle $ OverallQual OverallCond YearBuilt RoofStyle $ RoofMatl $ Exterior1st $ 
		ExterQual $ ExterCond $ Foundation $ BsmtQual $ BsmtCond $ BsmtFinType1 $ BsmtFinSF1 BsmtUnfSF Heating $ HeatingQC $ CentralAir $ Electrical $ GrLivArea FullBath HalfBath 
		BedroomAbvGr Kitchen KitchenQu $ TotRmsAbvGr Fireplaces GarageType $ PavedDrive $ WoodDecSF OpenProchSF YrSold SaleType $;
  run;

  /*
proc print data=college;
run;
  */

 data coded;
set college;
if RoofStyle="Gable" then roofS=0;
if RoofStyle="notGable" then roofS=1;
if RoofMatl="CompShg" then roofM=0;
if RoofMatl="notCompShg" then roofM=1;
if ExterQual="Gd" then extQ=0;
if ExterQual="TA" then extQ=1;
if ExterCond="Gd" then extC=0;
if ExterCond="TA" then extC=1;
if BsmtCond="TA" then bsmtC=0;
if BsmtCond="NA" then bsmtC=1;
if CentralAir="N" then AC=0;
if CentralAir="Y" then AC=1;
if Electrical="SBrkr" then elect=0;
if Electrical="Fuse" then elect=1;
if SaleType="WD" then deed=0;
if SaleType="notWD" then deed=1;

drop RoofStyle RoofMatl ExterQual ExterCond BsmtCond CentralAir Electrical SaleType;
run;

/*
proc print data=coded;
run;
*/

 ods graphics on;

 proc surveyselect data=coded out=trainset seed=123
 samprate=0.8 method=srs outall;
 run;


/*
proc print data=trainset;
run;
*/

 data collegetest;
  infile "C:\Users\Matilde Ruiz\OneDrive\Documents\STAT 580\HousesTest.csv" delimiter=',' firstobs=2 missover dsd;
  length RoofMatl $ 10. Neighborhood $ 9.;
  input ID $ Neighborhood $ LotFrontage LotArea LotShape $ LotConfig $ Utilities $  BldgType $ HouseStyle $ OverallQual OverallCond YearBuilt RoofStyle $ RoofMatl $ Exterior1st $ 
		ExterQual $ ExterCond $ Foundation $ BsmtQual $ BsmtCond $ BsmtFinType1 $ BsmtFinSF1 BsmtUnfSF Heating $ HeatingQC $ CentralAir $ Electrical $ GrLivArea FullBath HalfBath 
		BedroomAbvGr Kitchen KitchenQu $ TotRmsAbvGr Fireplaces GarageType $ PavedDrive $ WoodDecSF OpenProchSF YrSold SaleType $;
  run;

 data codedtest;
set college;
if RoofStyle="Gable" then roofS=0;
if RoofStyle="notGable" then roofS=1;
if RoofMatl="CompShg" then roofM=0;
if RoofMatl="notCompShg" then roofM=1;
if ExterQual="Gd" then extQ=0;
if ExterQual="TA" then extQ=1;
if ExterCond="Gd" then extC=0;
if ExterCond="TA" then extC=1;
if BsmtCond="TA" then bsmtC=0;
if BsmtCond="NA" then bsmtC=1;
if CentralAir="N" then AC=0;
if CentralAir="Y" then AC=1;
if Electrical="SBrkr" then elect=0;
if Electrical="Fuse" then elect=1;
if SaleType="WD" then deed=0;
if SaleType="notWD" then deed=1;

drop RoofStyle RoofMatl ExterQual ExterCond BsmtCond CentralAir Electrical SaleType;
run;


 proc glmselect data=trainset plots=all seed=123 valdata=codedtest;
 	partition role=selected(test='0' train='1');
	class Neighborhood LotShape LotConfig BldgType HouseStyle Exterior1st Foundation BsmtQual BsmtFinType1 Heating HeatingQC
		KitchenQu GarageType PavedDrive roofS roofM extQ extC bsmtC AC elect deed;
	model SalePrice= Neighborhood LotFrontage LotArea LotShape LotConfig BldgType HouseStyle OverallQual OverallCond YearBuilt  Exterior1st 
		Foundation BsmtQual BsmtFinType1 BsmtFinSF1 BsmtUnfSF Heating HeatingQC GrLivArea FullBath HalfBath
		BedroomAbvGr KitchenQu TotRmsAbvGr Fireplaces GarageType PavedDrive WoodDecSF OpenProchSF YrSold roofS roofM extQ extC bsmtC AC elect deed
		/selection=elasticnet (steps=120 choose=cv) cvmethod=random(10);
	store parestimates;		
quit;


 proc plm restore=parestimates;
 score data=codedtest out=pred;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

The predicted values should be in the PREDICTIONS data set in a variable named p_SalePrice.

 

Since I do not have the data you are using, here is an example that uses Sashelp.cars. The syntax is the same as yours. The call to PROC SGPLOT creates a histogram of the p_MPG_City variable (= the predicted values) to demonstrate that the SCORE statement is generating the predicted values.

 

data trainset codedtest;
call streaminit(12345);
set sashelp.cars;
if rand("Bernoulli", 0.3) then 
   output codedtest;
else do;
   selected = rand("Bernoulli", 0.6);
   output trainset;
end;
keep Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City Weight Wheelbase Length selected;
run;

proc glmselect data=trainset plots=none seed=123 valdata=codedtest;
 	partition role=selected(test='0' train='1');
	class Type Origin DriveTrain;
	model MPG_City = MSRP Invoice EngineSize Cylinders Horsepower Weight Wheelbase Length
		/selection=elasticnet (steps=120 choose=cv) cvmethod=random(10);
	score data=codedtest predicted out=predictions;		
quit;

proc sgplot data=predictions;
   histogram p_MPG_City;
run;

If your output does not have the p_SalePrice variable, then there might be something wrong with your input data set. For example, perhaps it does not contain all of the variables in the model or it has all missing values for one or more explanatory variables.

View solution in original post

6 REPLIES 6
Ksharp
Super User
PROC GLMSELECT is designed for variable selection NOT for scoring a regression model.
I think you need other proc like PROC GLM/GENMOD to score a new dataset within the variables you selected from PROC GLMSELECT. Check @Rick_SAS 's blog:

https://blogs.sas.com/content/iml/2014/02/19/scoring-a-regression-model-in-sas.html
https://blogs.sas.com/content/iml/2018/08/06/score-quantile-regression-sas.html
Rick_SAS
SAS Super FREQ

You can use the SCORE statement in PROC GLMSELECT to create an output data set that contains the predicted (and residual) values:

SAS Help Center: SCORE Statement

 

Mruizv
Obsidian | Level 7

Thank you, tried this as well but it is not really giving me the predicted output.

Mruizv_0-1713179437243.png

This is the change I had made, the second dataset contains 67 observations but it is scoring 246 so my sense is that is re-scoring the original one. does the score go before the model statement?

 proc glmselect data=trainset plots=all seed=123 valdata=codedtest;
 	partition role=selected(test='0' train='1');
	class Neighborhood LotShape LotConfig BldgType HouseStyle Exterior1st Foundation BsmtQual BsmtFinType1 Heating HeatingQC
		KitchenQu GarageType PavedDrive roofS roofM extQ extC bsmtC AC elect deed;
	model SalePrice= Neighborhood LotFrontage LotArea LotShape LotConfig BldgType HouseStyle OverallQual OverallCond YearBuilt  Exterior1st 
		Foundation BsmtQual BsmtFinType1 BsmtFinSF1 BsmtUnfSF Heating HeatingQC GrLivArea FullBath HalfBath
		BedroomAbvGr KitchenQu TotRmsAbvGr Fireplaces GarageType PavedDrive WoodDecSF OpenProchSF YrSold roofS roofM extQ extC bsmtC AC elect deed
		/selection=elasticnet (steps=120 choose=cv) cvmethod=random(10);
	score data=codedtest predicted out=predictions;		
quit;
Mruizv
Obsidian | Level 7

Found the error for the test, I was doing set college instead of college test on the test file before coding.

Once I fixed that still not scoring but at least reading the right input- no predicted values

Mruizv_0-1713180391096.png

 

Rick_SAS
SAS Super FREQ

The predicted values should be in the PREDICTIONS data set in a variable named p_SalePrice.

 

Since I do not have the data you are using, here is an example that uses Sashelp.cars. The syntax is the same as yours. The call to PROC SGPLOT creates a histogram of the p_MPG_City variable (= the predicted values) to demonstrate that the SCORE statement is generating the predicted values.

 

data trainset codedtest;
call streaminit(12345);
set sashelp.cars;
if rand("Bernoulli", 0.3) then 
   output codedtest;
else do;
   selected = rand("Bernoulli", 0.6);
   output trainset;
end;
keep Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City Weight Wheelbase Length selected;
run;

proc glmselect data=trainset plots=none seed=123 valdata=codedtest;
 	partition role=selected(test='0' train='1');
	class Type Origin DriveTrain;
	model MPG_City = MSRP Invoice EngineSize Cylinders Horsepower Weight Wheelbase Length
		/selection=elasticnet (steps=120 choose=cv) cvmethod=random(10);
	score data=codedtest predicted out=predictions;		
quit;

proc sgplot data=predictions;
   histogram p_MPG_City;
run;

If your output does not have the p_SalePrice variable, then there might be something wrong with your input data set. For example, perhaps it does not contain all of the variables in the model or it has all missing values for one or more explanatory variables.

Mruizv
Obsidian | Level 7

thank you so much!

With the plot I was able to see that it was actually computing my sale price and then I just added a Proc Print and it returned the table with the variables as well as the predicted value.

I am going to play some more with the variables to get some better estimates but the actual functionality is there

proc glmselect data=trainset plots=all seed=123;
 	partition role=selected(test='0' train='1');
	class Neighborhood LotShape LotConfig BldgType HouseStyle Exterior1st Foundation BsmtQual BsmtFinType1 Heating HeatingQC
		KitchenQu GarageType PavedDrive roofS roofM extQ extC bsmtC AC elect deed;
	model SalePrice= Neighborhood LotFrontage LotArea LotShape LotConfig BldgType HouseStyle OverallQual OverallCond YearBuilt  Exterior1st 
		Foundation BsmtQual BsmtFinType1 BsmtFinSF1 BsmtUnfSF Heating HeatingQC GrLivArea FullBath HalfBath
		BedroomAbvGr KitchenQu TotRmsAbvGr Fireplaces GarageType PavedDrive WoodDecSF OpenProchSF YrSold roofS roofM extQ extC bsmtC AC elect deed
		/selection=elasticnet (steps=120 choose=cv) cvmethod=random(10);
	score data=codedtest predicted out=predictions;	
run;

proc print data=predictions;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 449 views
  • 2 likes
  • 3 in conversation