Hi,
I'm using SAS Enterprise Guide to fit a model that aims to predict an ordinal variable ranging from 0 to 10, representing a rating. As predictors, I'm using binary, categorical, and continuous variables. I've tried both a simple regression model and also a logistic regression model using glogit, probit, and clogit as link functions.
For the regression model, I obtained a reasonably good fit with an R-squared around 87%. However, when I look at the predicted values, they always range between 7.3 and 8.5 for all levels of the target variable. I assume that the tails of my response are not being properly captured even with the high R-squared.
As for the logistic model, I have a higher misclassification rate even in the training data.
I also tried balancing my sample to better capture the pattern between levels, but I encountered the same problem in both approaches (linear regression and logistic regression).
Do you have any suggestions on which types of modeling I should use and how I can improve, i.e., obtain more accurate scores?
Thank you.
For the regression model, I obtained a reasonably good fit with an R-squared around 87%. However, when I look at the predicted values, they always range between 7.3 and 8.5 for all levels of the target variable. I assume that the tails of my response are not being properly captured even with the high R-squared.
This is a head-scratcher. But could you show us the residual plots from PROC REG if you choose PLOTS=DIAGNOSTICS? Please make screen capture(s) of the plots and include them in your reply by clicking on the "Insert Photos" icon.
Another thought I had is that your x-variables don't vary enough to produce a wider set of predictions. The high R-squared would indicate that most of the data is well predicted and its only the extremes of Y (where there are few values?) that are not well predicted.
Hi. Sorry for the delay. I'm just giving my data a little more scrutiny. I'm having trouble getting the residual plots with the suggested option.
ods graphics on;
proc reg data=OSAT_CLIENTES_B2C
plots=diagnostics;
model OS_TGT = &list_vars. / SELECTION=STEPWISE
SLE=0.05
SLS=0.05
INCLUDE=0
STB SS1 SS2 CORRB COVB CLB
PCORR1 PCORR2 SCORR1 SCORR2
ALPHA=0.05
COLLIN COLLINOINT TOL VIF SPEC ACOV DW ;
OUTPUT OUT=PREDLINREGPREDICTIONS_X
PREDICTED=predicted_OS_TGT
RESIDUAL=residual_OS_TGT
STUDENT=student_OS_TGT
RSTUDENT=rstudent_OS_TGT
LCL=lcl_OS_TGT
LCLM=lclm_OS_TGT
UCL=ucl_OS_TGT
UCLM=uclm_OS_TGT ;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.