BookmarkSubscribeRSS Feed
SCS78
Calcite | Level 5

Hi,

I'm using SAS Enterprise Guide to fit a model that aims to predict an ordinal variable ranging from 0 to 10, representing a rating. As predictors, I'm using binary, categorical, and continuous variables. I've tried both a simple regression model and also a logistic regression model using glogit, probit, and clogit as link functions.

For the regression model, I obtained a reasonably good fit with an R-squared around 87%. However, when I look at the predicted values, they always range between 7.3 and 8.5 for all levels of the target variable. I assume that the tails of my response are not being properly captured even with the high R-squared.

As for the logistic model, I have a higher misclassification rate even in the training data.

I also tried balancing my sample to better capture the pattern between levels, but I encountered the same problem in both approaches (linear regression and logistic regression).

Do you have any suggestions on which types of modeling I should use and how I can improve, i.e., obtain more accurate scores?

Thank you.

3 REPLIES 3
PaigeMiller
Diamond | Level 26

For the regression model, I obtained a reasonably good fit with an R-squared around 87%. However, when I look at the predicted values, they always range between 7.3 and 8.5 for all levels of the target variable. I assume that the tails of my response are not being properly captured even with the high R-squared.

 

This is a head-scratcher. But could you show us the residual plots from PROC REG if you choose PLOTS=DIAGNOSTICS? Please make screen capture(s) of the plots and include them in your reply by clicking on the "Insert Photos" icon.

 

Another thought I had is that your x-variables don't vary enough to produce a wider set of predictions. The high R-squared would indicate that most of the data is well predicted and its only the extremes of Y (where there are few values?) that are not well predicted.

--
Paige Miller
SCS78
Calcite | Level 5

Hi. Sorry for the delay. I'm just giving my data a little more scrutiny. I'm having trouble getting the residual plots with the suggested option.

 

ods graphics on;

proc reg data=OSAT_CLIENTES_B2C

plots=diagnostics;

 

model OS_TGT = &list_vars.  /        SELECTION=STEPWISE

SLE=0.05

SLS=0.05

INCLUDE=0

STB SS1 SS2 CORRB COVB CLB

PCORR1 PCORR2 SCORR1 SCORR2

ALPHA=0.05

COLLIN COLLINOINT TOL VIF SPEC ACOV DW ;

OUTPUT OUT=PREDLINREGPREDICTIONS_X

PREDICTED=predicted_OS_TGT

RESIDUAL=residual_OS_TGT

STUDENT=student_OS_TGT

RSTUDENT=rstudent_OS_TGT

LCL=lcl_OS_TGT

LCLM=lclm_OS_TGT

UCL=ucl_OS_TGT

UCLM=uclm_OS_TGT ;

run;

 

Ksharp
Super User
For many ordinal levels of a variable ,it is very suited for Machine Learning Method, like Decision Tree ,Random Forest , Neutral Net .Check:
PROC HPSPLIT
PROC HPFOREST
PROC HPSVM
....

Also you could try PROC PLS (partial least squares ) which is very robust and accurate, and unlike Machine Learning Method which usually are end with over-fited problem.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 330 views
  • 3 likes
  • 3 in conversation