BookmarkSubscribeRSS Feed
SCS78
Calcite | Level 5

Hi,

I'm using SAS Enterprise Guide to fit a model that aims to predict an ordinal variable ranging from 0 to 10, representing a rating. As predictors, I'm using binary, categorical, and continuous variables. I've tried both a simple regression model and also a logistic regression model using glogit, probit, and clogit as link functions.

For the regression model, I obtained a reasonably good fit with an R-squared around 87%. However, when I look at the predicted values, they always range between 7.3 and 8.5 for all levels of the target variable. I assume that the tails of my response are not being properly captured even with the high R-squared.

As for the logistic model, I have a higher misclassification rate even in the training data.

I also tried balancing my sample to better capture the pattern between levels, but I encountered the same problem in both approaches (linear regression and logistic regression).

Do you have any suggestions on which types of modeling I should use and how I can improve, i.e., obtain more accurate scores?

Thank you.

3 REPLIES 3
PaigeMiller
Diamond | Level 26

For the regression model, I obtained a reasonably good fit with an R-squared around 87%. However, when I look at the predicted values, they always range between 7.3 and 8.5 for all levels of the target variable. I assume that the tails of my response are not being properly captured even with the high R-squared.

 

This is a head-scratcher. But could you show us the residual plots from PROC REG if you choose PLOTS=DIAGNOSTICS? Please make screen capture(s) of the plots and include them in your reply by clicking on the "Insert Photos" icon.

 

Another thought I had is that your x-variables don't vary enough to produce a wider set of predictions. The high R-squared would indicate that most of the data is well predicted and its only the extremes of Y (where there are few values?) that are not well predicted.

--
Paige Miller
SCS78
Calcite | Level 5

Hi. Sorry for the delay. I'm just giving my data a little more scrutiny. I'm having trouble getting the residual plots with the suggested option.

 

ods graphics on;

proc reg data=OSAT_CLIENTES_B2C

plots=diagnostics;

 

model OS_TGT = &list_vars.  /        SELECTION=STEPWISE

SLE=0.05

SLS=0.05

INCLUDE=0

STB SS1 SS2 CORRB COVB CLB

PCORR1 PCORR2 SCORR1 SCORR2

ALPHA=0.05

COLLIN COLLINOINT TOL VIF SPEC ACOV DW ;

OUTPUT OUT=PREDLINREGPREDICTIONS_X

PREDICTED=predicted_OS_TGT

RESIDUAL=residual_OS_TGT

STUDENT=student_OS_TGT

RSTUDENT=rstudent_OS_TGT

LCL=lcl_OS_TGT

LCLM=lclm_OS_TGT

UCL=ucl_OS_TGT

UCLM=uclm_OS_TGT ;

run;

 

Ksharp
Super User
For many ordinal levels of a variable ,it is very suited for Machine Learning Method, like Decision Tree ,Random Forest , Neutral Net .Check:
PROC HPSPLIT
PROC HPFOREST
PROC HPSVM
....

Also you could try PROC PLS (partial least squares ) which is very robust and accurate, and unlike Machine Learning Method which usually are end with over-fited problem.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 730 views
  • 3 likes
  • 3 in conversation