BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ycenycute
Obsidian | Level 7

In SAS Enterprise Miner, we can choose variable selection. Under variable selection, we have R square method. I checked this document, which is a good explanation of how R squared method works. I am wondering in the first step, how is R square calculated. Is it that run regression on the target variable using each input, then get the R square for each input variable? 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Some variable selection algorithms (often known as "stepwise") go like this:

 

Step 1: compute the R-square for all of the x variables, then select the variable with the highest R-squared to be the first variable included in the model. (For example, let's say X7 has the highest R-squared, the model is now Y = X7)

 

Step 2: compute the R-squared for all of the possible models with X7 and ONE other variable. Pick the highest R-squared to be the second variable included in the model. (For example, let's say X2 has the high R-squared in this step, the model is now Y=X7 X2)

 

Continue until the increase in R-squared is less than some pre-specified threshold, or until the variable added isn't statistically significant, or ... there are all sorts of variations of this algorithm.

 

NOTE: among all possible models with k terms, this algorithm does not guarantee to find the model with k terms that has the highest R-squared.

--
Paige Miller

View solution in original post

3 REPLIES 3
sbxkoenk
SAS Super FREQ

Hello @ycenycute ,

 

R-square(d) measures the strength of the relationship between your model (your input / independent variables & the functional form of the model) and the dependent variable on a convenient 0 – 100% scale.

It measures how much of the total variance in your dependent variable is explained by the model, ... the more, the better of course.
R-Squared is ubiquitous in statistics, but that is also why people are no longer critical (R-Squared is not always blissful).

The main disadvantage of R-Squared is that it will always increase if you add an additional input to your model (even if that input is not significantly contributing to the power of the model, but is only explaining a bit of noise).

 

Anyway, How does the R-square selection method in the Variable Selection node of Enterprise Miner work?

Read it here:

SAS® Enterprise Miner™ 15.1: Reference Help

Variable Selection Node
https://go.documentation.sas.com/doc/en/emref/15.1/n1m7rvh6yyb3mmn0zavezsher4ml.htm

 

In short, in the Forward Stepwise Regression, ... at each successive step, an additional input variable is chosen that provides the largest incremental increase in the model R**2.

Forward Stepwise means you start with zero inputs in the model and then you add the one that provides the biggest R**2 in the simple model (the model with one input), then you add a 2nd variable (the one that provides the largest incremental increase in the model R**2) and so on ... until stopping criteria are met.

I propose you come back to us with what you do not understand over there (i.e. in the doc).

 

Kind regards,
Koen

ycenycute
Obsidian | Level 7
Thanks for the reply. I understand the meaning of R square. I was asking how is R square calculated in the first step. In your manual, it includes three 2 steps (3 for binary target). So in the first step, is SAS running linear regression for each input on the output, and then pick those inputs whose R square is above the threshold?
PaigeMiller
Diamond | Level 26

Some variable selection algorithms (often known as "stepwise") go like this:

 

Step 1: compute the R-square for all of the x variables, then select the variable with the highest R-squared to be the first variable included in the model. (For example, let's say X7 has the highest R-squared, the model is now Y = X7)

 

Step 2: compute the R-squared for all of the possible models with X7 and ONE other variable. Pick the highest R-squared to be the second variable included in the model. (For example, let's say X2 has the high R-squared in this step, the model is now Y=X7 X2)

 

Continue until the increase in R-squared is less than some pre-specified threshold, or until the variable added isn't statistically significant, or ... there are all sorts of variations of this algorithm.

 

NOTE: among all possible models with k terms, this algorithm does not guarantee to find the model with k terms that has the highest R-squared.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1436 views
  • 0 likes
  • 3 in conversation