BookmarkSubscribeRSS Feed
edasdfasdfasdfa
Quartz | Level 8

Hello,

 

So I've split the train into train and validation. I used the train data and fitted it, and then used that (model) to score validation. Now I'm unsure how I would go about scoring/predicting the test set. The original model does not seem to work as the test data does not have the response variable unlike the train/validation data.

 

Any help appreciated.

13 REPLIES 13
edasdfasdfasdfa
Quartz | Level 8

Here is what I have:

 

proc logistic data = traina outmodel=model1;
class Gender Married Self_Employed Property Education N_Dependents (ref='1')/ param = ref;
model Lo_Status(event='1') = Credit_History Education N_Dependents Self_Employed L_Amount_Term Gender Married Property L_Amount Applicant Co_Applicant;
run;


proc logistic inmodel=model1;
score data=valid out=validacc fitstat;
run;

 

So how would I now predict accuracy for test set? (as mentioned, my response variable is not in the test data set so technically it is not the same model?)

Reeza
Super User
Look into PROC PLM, though score should work. You don't need the response variable to score the data but you do for an accuracy measurement.
edasdfasdfasdfa
Quartz | Level 8

Isn't scoring and accuracy the same thing really? (ie when you score a validation set, isn't that predicting an accuracy? It just happens to be an accuracy on actual examples it was constructed on..where for test...the scoring is accuracy on unseen data?

 

 

Reeza
Super User
Not afaik, at least not within SAS. Scoring, just means calculating the expected value based on the new model.
edasdfasdfasdfa
Quartz | Level 8

So what would be the code for making a prediction ?

Reeza
Super User
Your code above does that, it's in the validACC data set. Because you used INMODEL, the model is the same, it's not refit to new data, it's just scored. The documentation has a fully worked example here:
https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_logistic_examples20.htm&docsetVer...
edasdfasdfasdfa
Quartz | Level 8

I actually looked at that link for an example as I was working on the code. So you are saying, in that above example of code I showed, since I am not re-fitting, I am producing an accuracy for the validation data set and it is stored/within the valacc data set?

Reeza
Super User
The fitstats will give you the AIC type stats - those should just be in the results window, the scored data with the predicted values should be in the valacc data set.
Reeza
Super User

If you want the accuracy data in a data set add this line to your proc logistic code that's scoring the data:

 

ods output scorefitstat=want;

The results will be in the WANT data set.

edasdfasdfasdfa
Quartz | Level 8

I'm still a bit confused on how I would fit a model WITHOUT a response variable (the test data does not have it). I need to score it but I can't score it if I can't create a model from the test data set.

Reeza
Super User
You can't fit a model without a response variable. But scoring is applying an existing model to new data. If you want to run a new model, use a new proc logistic step.
edasdfasdfasdfa
Quartz | Level 8

I want to create/run a new model but how do i do that using proc logistic without the data set (test) having a response variable? You can actually create a model without a response variable?

 

sorry if I am misunderstanding you.

Reeza
Super User
You cannot create a model without a response variable.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 1644 views
  • 0 likes
  • 2 in conversation