Hello,
I'm a new sas user, so firstly I'm sorry if these questions are dumb. 🙂
I'm basically doing a binary logistic regression, in order to predict my target variable (inactive=0, active=1) and I've randomly split the data into training (70%) and testing data (30%).
I used the proc logistic to run the logistic regression and now I need to understand if my model is overfitting /underfitting the data or not.
Does anyone have any suggestions for analyzing overfitting with proc logistic? Are we able to do learning curves?
proc logistic data=train;
class country gender / param=glm;
model y(event='1')=income var2 var3 /link=logit ctable
selection=backward slstay=0.05 hierarchy=single technique=fisher outroc=troc maxiter=50;
score data=test out=valpred outroc=vroc;
roc; roccontrast;
run;
Thank you all in advance,
Joana
@joanatomeribeir wrote:
By learning curves, I meant to plot the loss of the train and test over time to understand if the model is overfitted or not.
What do you mean by "loss of the train and test over time"? The general definition of overfitting does not include a time-related component.
You can compare the training and validation data sets using PROC LOGISTIC
http://support.sas.com/kb/39/724.html
Sorry, I didn't mean over time, I meant over training set size to understand if the model is overfitting/underfitting or if it is fitting the model well...
What are the two lines in these graphs? Are they the confidence intervals of the logistic regression model coefficients? Please be specific.
I'm sorry, you are right! Basically its the training and the validation set and i need to compare it to understand the bias and variance between the models.
More specifically, I would like to understand if for a binary logistic regression if it makes sense to plot the log loss (y axis) and the training set size (x axis) instead of plotting MSE.
I'm quite lost of this issue....
As far as I know, there is no built-in method in PROC LOGISTIC or PROC HPLOGISTIC to do this.
If your sas is above 9.4M4 , you could try GOF option.
model .........../gof ;
@Rick_SAS wrote a blog about it before ,and compare statistic V.S. machine learning .
If your sas version is low ,try LACKFIT option.
model ........../ lackfit firth ;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.