BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Toni94
Calcite | Level 5

Hello everyone!

I am using a student version of SAS and am trying to use predictive modeling, but am stuck at the "scoring step". So, I have got the chosen model and saved a scoring code, but do not know how can I employ it in case I want to use it on another set of data.

 

Can someone explain, please?

1 ACCEPTED SOLUTION

Accepted Solutions
StatsMan
SAS Super FREQ

The answer will depend on which SAS Procedure you are using to develop your model.  If the procedure has a SCORE or CODE statement in its syntax, then use those statements to score your new data set.  If the procedure has a STORE statement, then you can store your model in a SAS item store and then score new observations later using PROC PLM.  See this example for help.

 

A trick that will work with any SAS modeling procedure is to add the scoring observations to the modeling data set.  Set the target or response variable to a missing value for those observations.  When the model is fit, only those observations with nonmissing responses will be used to create the model.  You can request predictions as you would for any other use of that modeling procedure.  See Rick's blog post on this topic for help. 

 

See Rick's other blog post on scoring in general.

View solution in original post

8 REPLIES 8
StatsMan
SAS Super FREQ

The answer will depend on which SAS Procedure you are using to develop your model.  If the procedure has a SCORE or CODE statement in its syntax, then use those statements to score your new data set.  If the procedure has a STORE statement, then you can store your model in a SAS item store and then score new observations later using PROC PLM.  See this example for help.

 

A trick that will work with any SAS modeling procedure is to add the scoring observations to the modeling data set.  Set the target or response variable to a missing value for those observations.  When the model is fit, only those observations with nonmissing responses will be used to create the model.  You can request predictions as you would for any other use of that modeling procedure.  See Rick's blog post on this topic for help. 

 

See Rick's other blog post on scoring in general.

Toni94
Calcite | Level 5
Thank You for reaching out. Will try the method for sure.
Toni94
Calcite | Level 5

What happened is that I haven't succeeded in getting the predictions. I am a complete novice...

 

If you could tell me if my scenario of steps was right:

1) found the fitting model

2)made scoring data set

3) combined scoring data set with the data set that contains missing Y values using "set" statement

4) did the predictive regression modeling all over again

5) predictive Y values should be seen in the "output data" section

StatsMan
SAS Super FREQ

Are you using one of the predefined tasks in SAS Studio or are you writing procedure code?  Post your proc code, or the code generated by your SAS Studio task, or a screen shot of the SAS Studio task in question.

Toni94
Calcite | Level 5

So, this is how the code looks like when doing predictive regression. I am using already predefined SAS tasks.

 

proc glmselect data=STAT1.AMESHOUSING3 plots=(criterionpanel);
partition fraction(validate=0.2);
class Heating_QC Central_Air Fireplaces / param=glm;
model SalePrice=Heating_QC Central_Air Fireplaces Lot_Area Garage_Area
Basement_Area Age_Sold / selection=backward
(select=sbc choose=validate) hierarchy=single;
score out=work.Score1 predicted residual;
code file=sfile;
run;

filename sfile CLEAR;

 

 

Thank You for the help.

StatsMan
SAS Super FREQ

The task does not appear to have a way to score a new set of data by default.  You can edit the code generated by the task, though.  Click on Edit in the code window, and then insert DATA=YourDataName in the SCORE statement.  By default, the SCORE statement scores the input data set for the procedure, the one specified through the DATA= option on the PROC statement.  Adding your own DATA= to the SCORE statement changes the set of data for which you will get predictions.  Want both?  Then use a second SCORE statement.

salammunshi
Calcite | Level 5

Thanks a lot Toni94! I have started digging down on your suggestions and will come back again if I need more suggestions.

Thanks again!

Mou

salammunshi
Calcite | Level 5

Following your suggestions, I have done the following process:

 

1) Fit the model

2) take the scoring variable of the target

3) Make the low P_1 target response as 0 e.g. if P_1<.50 then response=0  /* Incidence rate  is very low and most of the predictors value are missing due to customer inactivity for a long time */

4) run the logistic regression model again by using P_1 as a predictor

5) fitting the model

6) Scoring the new dataset based on that model.

 

Please find below the code - I'm using:

 /********************************************************************/
 /*  fit logistic regression model                                   */
 /********************************************************************/
 proc    logistic    data                                    =   RPT_Response_XX_2_2
                     outmodel                                =   rpt_response_XX_model_param
                     descending
      namelen         = 32      
                     ;
          model       response
                      =
     &munvar
     P_new  

       /
                      selection       =   stepwise
                      link        =   logit
                      outroc        =   trn_outroc
                      roceps        =   0.0001
                      ctable
                      pprob        =   (0.00 to 1.00  by 0.01)
                      ;

          output      out         =  XX_churn_mod_pred
                      predprobs       =   individual
                      ;

 run;     quit;

 

 /********************************************************************/
 /********************************************************************/
 /********************************************************************/
 /*   score dataset         */
 /********************************************************************/
 /********************************************************************/
 /********************************************************************/
 proc    logistic    inmodel                                 =   rpt_response_XX_model_param;
         score       data                                    =  rpt_response_XX_all
                     out                                     =   RPT_Response_XX_scored_data
                     outroc                                  =   trn_outroc
                     ;
 run;

 

After having the score data sets , how could I interpret the P_1 ,Is that also weigh up probability scores for the same type of

customers as like as missing response(real target) . I'm really stuck on that point. Your quick guidance is highly appreciated.

 

Thanks again for your suggestions.

 

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1683 views
  • 3 likes
  • 3 in conversation