Help using Base SAS procedures

find the best variables to use and best segmentation

Reply
Super Contributor
Posts: 401

find the best variables to use and best segmentation

What is the best procedure to use if I want to do

1) Find the best variables to use in a model out of 30 and

2) Examine the best breaks or cutoffs once I find that variable ?

For example a score may be the best default predictor (dep var), and segmented at 300 500 and 650 .. etc.  Thanks

Trusted Advisor
Posts: 1,228

Re: find the best variables to use and best segmentation

Hi,

I am assuming that you've identified variables which will be used as predictors. Proc varclus can identify variables which are loading heavily and explaining most of the variation. In that way you may select only some of the variables for further analysis even less than 30. In second phase use kmeans clustering to find best cutt-offs.

Thanks,

Super Contributor
Posts: 401

Re: find the best variables to use and best segmentation

Thank you for the response.. this is very helpful.. What do you mean by loading heavily?

Trusted Advisor
Posts: 1,228

Re: find the best variables to use and best segmentation

This is a data reduction concept and we try to reduce dimensionality of the data. Proc varclus apply principal components to identify group of variables which are highly correlated within their clusters but least correlated with other groups. Loadings means correlation between variable and the principal components. Please refer to following link for further details.

http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_varclus_sect...

Super Contributor
Posts: 401

Re: find the best variables to use and best segmentation

I was hoping to find a procedure that finds the best variables that are the most significant to the dependent varaible.  If for example I have 20 varaibles and 1 dep var.  I want to know which ones of the 20 variables are best in predicting the dep var.

Respected Advisor
Posts: 2,655

Re: find the best variables to use and best segmentation

Why?  You have all 20 measures.  Do you mean which variables are most closely correlated with the predicted value?  Then you need to consider the role of moderating and mediating variables.  Or do you mean which single variable is the best predictor?  If so, again I ask, why?  If you have all variables available, then to not use them is just, well, ignoring what you do have.  Or do you mean which variable (or variables) are the most economical predictors, in the sense of future data?  By economical, I mean those that lead to accurate predicted value for the least cost of measurement.  I think you are concerned about building a predictive model.  If so, subject matter expertise should enter as well as statistical considerations.  Parsimony for the sake of parsimony alone will always lead to poor predictive models, just as over complexity can.

Use the methods outlined by @stat@sas above to get started.  If you feel some sort of compulsion to try variable selection methods, look at LAR and LASSO methods in GLMSELECT.  DO NOT USE STEPWISE, FORWARD, BACKWARD OR ALL POSSIBLE SUBSETS REGRESSION.  These have been shown to produce biased results that lead to poor predictive models.  Google "Flom Cassell" for more info, or read Frank Harrell's book on regression methods.

Steve Denham

Super Contributor
Posts: 401

Re: find the best variables to use and best segmentation

Posted in reply to SteveDenham

Great advice.. thanks..

To answer your questions, I have 20 variables as predictors, (for example time-on-books, FICO score, utilization, location, product, etc.) and 1 response variable (bad or not bad as in defaulted loans)..  A business unit has asked me to create a chart of the Response Variable but segmented by the top 3 predictors. For example separate the bads/goods by Location and Product and FICO.  It has to be the 3 best significant predictors.  Similar to a decision tree.

Ask a Question
Discussion stats
  • 6 replies
  • 293 views
  • 3 likes
  • 3 in conversation