- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm a new sas user, so firstly I'm sorry if these questions are dumb. 🙂
I'm basically doing a binary logistic regression, in order to predict my target variable (inactive=0, active=1) and I've randomly split the data into training (70%) and testing data (30%).
I used the proc logistic to run the logistic regression and now I need to understand if my model is overfitting /underfitting the data or not.
Does anyone have any suggestions for analyzing overfitting with proc logistic? Are we able to do learning curves?
proc logistic data=train;
class country gender / param=glm;
model y(event='1')=income var2 var3 /link=logit ctable
selection=backward slstay=0.05 hierarchy=single technique=fisher outroc=troc maxiter=50;
score data=test out=valpred outroc=vroc;
roc; roccontrast;
run;
Thank you all in advance,
Joana
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can also look at PROC PLM.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@joanatomeribeir wrote:
By learning curves, I meant to plot the loss of the train and test over time to understand if the model is overfitted or not.
What do you mean by "loss of the train and test over time"? The general definition of overfitting does not include a time-related component.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can compare the training and validation data sets using PROC LOGISTIC
http://support.sas.com/kb/39/724.html
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I didn't mean over time, I meant over training set size to understand if the model is overfitting/underfitting or if it is fitting the model well...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What are the two lines in these graphs? Are they the confidence intervals of the logistic regression model coefficients? Please be specific.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm sorry, you are right! Basically its the training and the validation set and i need to compare it to understand the bias and variance between the models.
More specifically, I would like to understand if for a binary logistic regression if it makes sense to plot the log loss (y axis) and the training set size (x axis) instead of plotting MSE.
I'm quite lost of this issue....
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As far as I know, there is no built-in method in PROC LOGISTIC or PROC HPLOGISTIC to do this.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If your sas is above 9.4M4 , you could try GOF option.
model .........../gof ;
@Rick_SAS wrote a blog about it before ,and compare statistic V.S. machine learning .
If your sas version is low ,try LACKFIT option.
model ........../ lackfit firth ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content