Solved: Re: Predictive Regression-scoring issues

Toni94 · Posted 03-21-2019 03:24 AM

Hello everyone!

I am using a student version of SAS and am trying to use predictive modeling, but am stuck at the "scoring step". So, I have got the chosen model and saved a scoring code, but do not know how can I employ it in case I want to use it on another set of data.

Can someone explain, please?

StatsMan · Posted 03-21-2019 08:36 AM

The answer will depend on which SAS Procedure you are using to develop your model. If the procedure has a SCORE or CODE statement in its syntax, then use those statements to score your new data set. If the procedure has a STORE statement, then you can store your model in a SAS item store and then score new observations later using PROC PLM. See this example for help.

A trick that will work with any SAS modeling procedure is to add the scoring observations to the modeling data set. Set the target or response variable to a missing value for those observations. When the model is fit, only those observations with nonmissing responses will be used to create the model. You can request predictions as you would for any other use of that modeling procedure. See Rick's blog post on this topic for help.

See Rick's other blog post on scoring in general.

View solution in original post

StatsMan · Posted 03-21-2019 08:36 AM

The answer will depend on which SAS Procedure you are using to develop your model. If the procedure has a SCORE or CODE statement in its syntax, then use those statements to score your new data set. If the procedure has a STORE statement, then you can store your model in a SAS item store and then score new observations later using PROC PLM. See this example for help.

A trick that will work with any SAS modeling procedure is to add the scoring observations to the modeling data set. Set the target or response variable to a missing value for those observations. When the model is fit, only those observations with nonmissing responses will be used to create the model. You can request predictions as you would for any other use of that modeling procedure. See Rick's blog post on this topic for help.

See Rick's other blog post on scoring in general.

Toni94 · Posted 03-21-2019 11:57 AM

Thank You for reaching out. Will try the method for sure.

Toni94 · Posted 03-21-2019 02:52 PM

What happened is that I haven't succeeded in getting the predictions. I am a complete novice...

If you could tell me if my scenario of steps was right:

1) found the fitting model

2)made scoring data set

3) combined scoring data set with the data set that contains missing Y values using "set" statement

4) did the predictive regression modeling all over again

5) predictive Y values should be seen in the "output data" section

StatsMan · Posted 03-22-2019 12:06 PM

Are you using one of the predefined tasks in SAS Studio or are you writing procedure code? Post your proc code, or the code generated by your SAS Studio task, or a screen shot of the SAS Studio task in question.

Toni94 · Posted 03-25-2019 03:37 PM

So, this is how the code looks like when doing predictive regression. I am using already predefined SAS tasks.

proc glmselect data=STAT1.AMESHOUSING3 plots=(criterionpanel);
partition fraction(validate=0.2);
class Heating_QC Central_Air Fireplaces / param=glm;
model SalePrice=Heating_QC Central_Air Fireplaces Lot_Area Garage_Area
Basement_Area Age_Sold / selection=backward
(select=sbc choose=validate) hierarchy=single;
score out=work.Score1 predicted residual;
code file=sfile;
run;

filename sfile CLEAR;

Thank You for the help.

StatsMan · Posted 03-26-2019 07:57 AM

The task does not appear to have a way to score a new set of data by default. You can edit the code generated by the task, though. Click on Edit in the code window, and then insert DATA=YourDataName in the SCORE statement. By default, the SCORE statement scores the input data set for the procedure, the one specified through the DATA= option on the PROC statement. Adding your own DATA= to the SCORE statement changes the set of data for which you will get predictions. Want both? Then use a second SCORE statement.

salammunshi · Posted 06-15-2019 11:10 PM

Thanks a lot Toni94! I have started digging down on your suggestions and will come back again if I need more suggestions.

Thanks again!

Mou

salammunshi · Posted 06-16-2019 02:29 AM

Following your suggestions, I have done the following process:

1) Fit the model

2) take the scoring variable of the target

3) Make the low P_1 target response as 0 e.g. if P_1<.50 then response=0 /* Incidence rate is very low and most of the predictors value are missing due to customer inactivity for a long time */

4) run the logistic regression model again by using P_1 as a predictor

5) fitting the model

6) Scoring the new dataset based on that model.

Please find below the code - I'm using:

/********************************************************************/
/* fit logistic regression model                                   */
/********************************************************************/
proc    logistic    data                                    =   RPT_Response_XX_2_2
                     outmodel                                =   rpt_response_XX_model_param
                     descending
      namelen         = 32
                     ;
          model       response
                      =
     &munvar
     P_new

       /
                      selection       =   stepwise
                      link        =   logit
                      outroc        =   trn_outroc
                      roceps        =   0.0001
                      ctable
                      pprob        =   (0.00 to 1.00  by 0.01)
                      ;

          output      out         = XX_churn_mod_pred
                      predprobs       =   individual
                      ;

run; quit;

/********************************************************************/
/********************************************************************/
/********************************************************************/
/*   score dataset         */
/********************************************************************/
/********************************************************************/
/********************************************************************/
proc    logistic    inmodel                                 =   rpt_response_XX_model_param;
         score       data                                    = rpt_response_XX_all
                     out                                     =   RPT_Response_XX_scored_data
                     outroc                                  =   trn_outroc
                     ;
run;

After having the score data sets , how could I interpret the P_1 ,Is that also weigh up probability scores for the same type of

customers as like as missing response(real target) . I'm really stuck on that point. Your quick guidance is highly appreciated.

Thanks again for your suggestions.

SAS Innovate 2025: Register Today!