Hi,
I'm using SAS Enterprise Guide to fit a model that aims to predict an ordinal variable ranging from 0 to 10, representing a rating. As predictors, I'm using binary, categorical, and continuous variables. I've tried both a simple regression model and also a logistic regression model using glogit, probit, and clogit as link functions.
For the regression model, I obtained a reasonably good fit with an R-squared around 87%. However, when I look at the predicted values, they always range between 7.3 and 8.5 for all levels of the target variable. I assume that the tails of my response are not being properly captured even with the high R-squared.
As for the logistic model, I have a higher misclassification rate even in the training data.
I also tried balancing my sample to better capture the pattern between levels, but I encountered the same problem in both approaches (linear regression and logistic regression).
Do you have any suggestions on which types of modeling I should use and how I can improve, i.e., obtain more accurate scores?
Thank you.
For the regression model, I obtained a reasonably good fit with an R-squared around 87%. However, when I look at the predicted values, they always range between 7.3 and 8.5 for all levels of the target variable. I assume that the tails of my response are not being properly captured even with the high R-squared.
This is a head-scratcher. But could you show us the residual plots from PROC REG if you choose PLOTS=DIAGNOSTICS? Please make screen capture(s) of the plots and include them in your reply by clicking on the "Insert Photos" icon.
Another thought I had is that your x-variables don't vary enough to produce a wider set of predictions. The high R-squared would indicate that most of the data is well predicted and its only the extremes of Y (where there are few values?) that are not well predicted.
Hi. Sorry for the delay. I'm just giving my data a little more scrutiny. I'm having trouble getting the residual plots with the suggested option.
ods graphics on;
proc reg data=OSAT_CLIENTES_B2C
plots=diagnostics;
model OS_TGT = &list_vars. / SELECTION=STEPWISE
SLE=0.05
SLS=0.05
INCLUDE=0
STB SS1 SS2 CORRB COVB CLB
PCORR1 PCORR2 SCORR1 SCORR2
ALPHA=0.05
COLLIN COLLINOINT TOL VIF SPEC ACOV DW ;
OUTPUT OUT=PREDLINREGPREDICTIONS_X
PREDICTED=predicted_OS_TGT
RESIDUAL=residual_OS_TGT
STUDENT=student_OS_TGT
RSTUDENT=rstudent_OS_TGT
LCL=lcl_OS_TGT
LCLM=lclm_OS_TGT
UCL=ucl_OS_TGT
UCLM=uclm_OS_TGT ;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.