Hello everyone!
I am using a student version of SAS and am trying to use predictive modeling, but am stuck at the "scoring step". So, I have got the chosen model and saved a scoring code, but do not know how can I employ it in case I want to use it on another set of data.
Can someone explain, please?
The answer will depend on which SAS Procedure you are using to develop your model. If the procedure has a SCORE or CODE statement in its syntax, then use those statements to score your new data set. If the procedure has a STORE statement, then you can store your model in a SAS item store and then score new observations later using PROC PLM. See this example for help.
A trick that will work with any SAS modeling procedure is to add the scoring observations to the modeling data set. Set the target or response variable to a missing value for those observations. When the model is fit, only those observations with nonmissing responses will be used to create the model. You can request predictions as you would for any other use of that modeling procedure. See Rick's blog post on this topic for help.
See Rick's other blog post on scoring in general.
The answer will depend on which SAS Procedure you are using to develop your model. If the procedure has a SCORE or CODE statement in its syntax, then use those statements to score your new data set. If the procedure has a STORE statement, then you can store your model in a SAS item store and then score new observations later using PROC PLM. See this example for help.
A trick that will work with any SAS modeling procedure is to add the scoring observations to the modeling data set. Set the target or response variable to a missing value for those observations. When the model is fit, only those observations with nonmissing responses will be used to create the model. You can request predictions as you would for any other use of that modeling procedure. See Rick's blog post on this topic for help.
See Rick's other blog post on scoring in general.
What happened is that I haven't succeeded in getting the predictions. I am a complete novice...
If you could tell me if my scenario of steps was right:
1) found the fitting model
2)made scoring data set
3) combined scoring data set with the data set that contains missing Y values using "set" statement
4) did the predictive regression modeling all over again
5) predictive Y values should be seen in the "output data" section
Are you using one of the predefined tasks in SAS Studio or are you writing procedure code? Post your proc code, or the code generated by your SAS Studio task, or a screen shot of the SAS Studio task in question.
So, this is how the code looks like when doing predictive regression. I am using already predefined SAS tasks.
proc glmselect data=STAT1.AMESHOUSING3 plots=(criterionpanel);
partition fraction(validate=0.2);
class Heating_QC Central_Air Fireplaces / param=glm;
model SalePrice=Heating_QC Central_Air Fireplaces Lot_Area Garage_Area
Basement_Area Age_Sold / selection=backward
(select=sbc choose=validate) hierarchy=single;
score out=work.Score1 predicted residual;
code file=sfile;
run;
filename sfile CLEAR;
Thank You for the help.
The task does not appear to have a way to score a new set of data by default. You can edit the code generated by the task, though. Click on Edit in the code window, and then insert DATA=YourDataName in the SCORE statement. By default, the SCORE statement scores the input data set for the procedure, the one specified through the DATA= option on the PROC statement. Adding your own DATA= to the SCORE statement changes the set of data for which you will get predictions. Want both? Then use a second SCORE statement.
Thanks a lot Toni94! I have started digging down on your suggestions and will come back again if I need more suggestions.
Thanks again!
Mou
Following your suggestions, I have done the following process:
1) Fit the model
2) take the scoring variable of the target
3) Make the low P_1 target response as 0 e.g. if P_1<.50 then response=0 /* Incidence rate is very low and most of the predictors value are missing due to customer inactivity for a long time */
4) run the logistic regression model again by using P_1 as a predictor
5) fitting the model
6) Scoring the new dataset based on that model.
Please find below the code - I'm using:
/********************************************************************/
/* fit logistic regression model */
/********************************************************************/
proc logistic data = RPT_Response_XX_2_2
outmodel = rpt_response_XX_model_param
descending
namelen = 32
;
model response
=
&munvar
P_new
/
selection = stepwise
link = logit
outroc = trn_outroc
roceps = 0.0001
ctable
pprob = (0.00 to 1.00 by 0.01)
;
output out = XX_churn_mod_pred
predprobs = individual
;
run; quit;
/********************************************************************/
/********************************************************************/
/********************************************************************/
/* score dataset */
/********************************************************************/
/********************************************************************/
/********************************************************************/
proc logistic inmodel = rpt_response_XX_model_param;
score data = rpt_response_XX_all
out = RPT_Response_XX_scored_data
outroc = trn_outroc
;
run;
After having the score data sets , how could I interpret the P_1 ,Is that also weigh up probability scores for the same type of
customers as like as missing response(real target) . I'm really stuck on that point. Your quick guidance is highly appreciated.
Thanks again for your suggestions.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.