BookmarkSubscribeRSS Feed
joanatomeribeir
Fluorite | Level 6

Hello,

 

I'm a new sas user, so firstly I'm sorry if these questions are dumb. 🙂

 

I'm basically doing a binary logistic regression, in order to predict my target variable (inactive=0, active=1) and I've randomly split the data into training (70%) and testing data (30%).

 

I used the proc logistic to run the logistic regression and now I need to understand if my model is overfitting /underfitting the data or not.

 

Does anyone have any suggestions for analyzing overfitting with proc logistic? Are we able to do learning curves?

 

proc logistic data=train;

class country gender / param=glm;

model y(event='1')=income var2 var3 /link=logit ctable

selection=backward slstay=0.05 hierarchy=single technique=fisher outroc=troc maxiter=50;

score data=test out=valpred outroc=vroc;

roc; roccontrast;

run;

 

Thank you all in advance,

Joana

 

12 REPLIES 12
Reeza
Super User
ROC and ROC contrast are the curves usually used, is that what you mean by learning curves?

You can also look at PROC PLM.
joanatomeribeir
Fluorite | Level 6
By learning curves, I meant to plot the loss of the train and test over time to understand if the model is overfitted or not.
PaigeMiller
Diamond | Level 26

@joanatomeribeir wrote:
By learning curves, I meant to plot the loss of the train and test over time to understand if the model is overfitted or not.

What do you mean by "loss of the train and test over time"? The general definition of overfitting does not include a time-related component.

--
Paige Miller
PaigeMiller
Diamond | Level 26

You can compare the training and validation data sets using PROC LOGISTIC

http://support.sas.com/kb/39/724.html

--
Paige Miller
joanatomeribeir
Fluorite | Level 6
I did it, i scored the data with score statement on proc logistic, but I want to understand if the model is overfitted or not..
joanatomeribeir
Fluorite | Level 6

Sorry, I didn't mean over time, I meant over training set size to understand if the model is overfitting/underfitting or if it is fitting the model well...

joanatomeribeir_0-1591693274688.png

 

PaigeMiller
Diamond | Level 26

What are the two lines in these graphs? Are they the confidence intervals of the logistic regression model coefficients? Please be specific.

--
Paige Miller
joanatomeribeir
Fluorite | Level 6

I'm sorry, you are right! Basically its the training and the validation set and i need to compare it to understand the bias and variance between the models.

 

joanatomeribeir_0-1591701833589.png

More specifically, I would like to understand if for a binary logistic regression if it makes sense to plot the log loss (y axis) and the training set size (x axis) instead of plotting MSE.

joanatomeribeir_1-1591702209451.png

 

I'm quite lost of this issue....

PaigeMiller
Diamond | Level 26

As far as I know, there is no built-in method in PROC LOGISTIC or PROC HPLOGISTIC to do this.

--
Paige Miller
Ksharp
Super User

If your sas is above 9.4M4 , you could try GOF option.

model .........../gof ;

@Rick_SAS  wrote a blog about it before ,and compare statistic V.S. machine learning .

 

If your sas version is low ,try LACKFIT option.

model ........../ lackfit   firth ;

Ksharp
Super User
You could try PROC HPGENSELECT +PARTITION as same as I refer to Rick's blog.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 2807 views
  • 5 likes
  • 4 in conversation