BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PaigeMiller
Diamond | Level 26

You want to fit a model to the Training data set, and then apply the fitted model from the training data set to the validation data set. This is not what you have done ... you have fit a whole new model to the validation data set.

 

Here is an example of how to apply the fitted model to the validation data set: http://support.sas.com/kb/39/724.html

--
Paige Miller
GreggB
Pyrite | Level 9

I hope this is what you mean because I am getting lost past this step (using the SAS support article)

 

/* using the SAS support article as a guide */
/* 1. fit the model to the training data set */
/* 2. Include a SCORE statement to apply the fitted model to VALID*/
      ods graphics on;
      proc logistic data=train;
        model camp_flag(event="1") = rit / outroc=troc;
        score data=valid out=valpred outroc=vroc;
        roc; roccontrast;
        run;
PaigeMiller
Diamond | Level 26

Looks good to me.

--
Paige Miller
GreggB
Pyrite | Level 9

I hope we're near a crescendo. Can I take a dataset like the example below and use of these existing data sets (valpred, vroc or troc) to fill in the missing values with a predicted value, or give a probability that the event will occur?

 

 

data newstuff;

input RIT camp_flag;

datalines;

240 .

200 .

150 .

;

 

PaigeMiller
Diamond | Level 26

As I said earlier, the SCORE statement will give you predicted values on the new data set. Example in the documentation.

--
Paige Miller
Reeza
Super User
Or you can use a CODE statement to get a program you can run directly on any new values within your data step.
GreggB
Pyrite | Level 9

Do you a favorite article on this topic you could point me to?

Reeza
Super User
You use PROC SCORE or PROC PLS to score your new data set. PLS has more options these days as its the 'newest' procedure. Remember to specify the option for logistic regression though otherwise it doesn't exponentiate the estimate.
PaigeMiller
Diamond | Level 26

@Reeza wrote:
You use PROC SCORE or PROC PLS to score your new data set. PLS has more options these days as its the 'newest' procedure. Remember to specify the option for logistic regression though otherwise it doesn't exponentiate the estimate.

Do you mean PROC PLM?

--
Paige Miller
Reeza
Super User

So if a student has already attended they cannot attend again or they’re not going to be recommended to attend even if their test scores warrant it? 


@GreggB wrote:

They would attend only once.  To be sure I can unduplicate by Student ID to make sure.

 

I think I read about what you're saying - the data is divided into 2 sets using ranuni. One set is used to create the model and the other half is used for prediction?


 

GreggB
Pyrite | Level 9

The summer camp is for grade 3 only. The only way a student would attend twice would be if they are retained in grade 3 and they score low enough both times to be flagged for attendance at the summer camp. Since all the data sets have a unique student ID I can easily find scenarios like this if they occurred 

Reeza
Super User
So not the same students each year, that's better then. I'd definitely remove those records but you do need to account for them somehow.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 28 replies
  • 2259 views
  • 7 likes
  • 3 in conversation