BookmarkSubscribeRSS Feed
podarum
Quartz | Level 8

What is the best procedure to use if I want to do

1) Find the best variables to use in a model out of 30 and

2) Examine the best breaks or cutoffs once I find that variable ?

For example a score may be the best default predictor (dep var), and segmented at 300 500 and 650 .. etc.  Thanks

6 REPLIES 6
stat_sas
Ammonite | Level 13

Hi,

I am assuming that you've identified variables which will be used as predictors. Proc varclus can identify variables which are loading heavily and explaining most of the variation. In that way you may select only some of the variables for further analysis even less than 30. In second phase use kmeans clustering to find best cutt-offs.

Thanks,

podarum
Quartz | Level 8

Thank you for the response.. this is very helpful.. What do you mean by loading heavily?

stat_sas
Ammonite | Level 13

This is a data reduction concept and we try to reduce dimensionality of the data. Proc varclus apply principal components to identify group of variables which are highly correlated within their clusters but least correlated with other groups. Loadings means correlation between variable and the principal components. Please refer to following link for further details.

http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_varclus_sect...

podarum
Quartz | Level 8

I was hoping to find a procedure that finds the best variables that are the most significant to the dependent varaible.  If for example I have 20 varaibles and 1 dep var.  I want to know which ones of the 20 variables are best in predicting the dep var.

SteveDenham
Jade | Level 19

Why?  You have all 20 measures.  Do you mean which variables are most closely correlated with the predicted value?  Then you need to consider the role of moderating and mediating variables.  Or do you mean which single variable is the best predictor?  If so, again I ask, why?  If you have all variables available, then to not use them is just, well, ignoring what you do have.  Or do you mean which variable (or variables) are the most economical predictors, in the sense of future data?  By economical, I mean those that lead to accurate predicted value for the least cost of measurement.  I think you are concerned about building a predictive model.  If so, subject matter expertise should enter as well as statistical considerations.  Parsimony for the sake of parsimony alone will always lead to poor predictive models, just as over complexity can.

Use the methods outlined by @stat@sas above to get started.  If you feel some sort of compulsion to try variable selection methods, look at LAR and LASSO methods in GLMSELECT.  DO NOT USE STEPWISE, FORWARD, BACKWARD OR ALL POSSIBLE SUBSETS REGRESSION.  These have been shown to produce biased results that lead to poor predictive models.  Google "Flom Cassell" for more info, or read Frank Harrell's book on regression methods.

Steve Denham

podarum
Quartz | Level 8

Great advice.. thanks..

To answer your questions, I have 20 variables as predictors, (for example time-on-books, FICO score, utilization, location, product, etc.) and 1 response variable (bad or not bad as in defaulted loans)..  A business unit has asked me to create a chart of the Response Variable but segmented by the top 3 predictors. For example separate the bads/goods by Location and Product and FICO.  It has to be the 3 best significant predictors.  Similar to a decision tree.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 976 views
  • 3 likes
  • 3 in conversation