Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Collinearity problem in robust regression

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-11-2018 04:09 AM
(2129 views)

I will appreciate if you can help me with some insights to solve this problem.

I was carrying out a robust regression with continuous and categorical variables. For this, I transformed categorical variables into dummie variables.

54 proc robustreg data=cpro.fstqwom_dum method=mm plots=all ;

55 model inbody_pbf=sdtaqdum2 sdtaqdum3 sdtaqdum4 age bmi waist bsa

56 exerc2 smokdum2 smokdum3 drinkdum2 drinkdum3 jobdum1 jobdum2 jobdum3 jobdum4 jobdum5 jobdum6

57 jobdum7 jobdum8 jobdum9 jobdum10 jobdum11 jobdum13 / diagnostics leverage (opc mcdinfo) ;

58 output out=robrefsittapbfresw5 weight=wgt ;

59 test sdtaqdum2 sdtaqdum3 sdtaqdum4 ;

60 run ;

However, after inserting additional dummies variables (corresponding to those who appear in blue), I got this message.

Furthermore, my previous modeling didn't present same problem. And, I even used the CLASS option but the problem still was present.

WARNING: The design matrix is singular. Some regressors are dropped from the matrix. LEVERAGE is being computed

on the reduced design matrix.

ERROR: The current MM estimation failed because a collinearity problem for a subset of the dataset occurred in

its initial LTS estimation.

ERROR: Initial LTS estimator cannot be computed.

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you have 13 levels of JOB, then you need 12 (not 13) variables jobdum1 through jobdum12. Similarly for your other dummy variables. This eliminates the message.

Using ROBUSTREG will not solve any colinearity problems, ROBUSTREG is effective in the presence of outliers.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks #PaigeMiller for your answer and indeed I took just 12 dummie variables omiting jobdum12 as reference, but I still found same dissapointing results. And, as you mention robust regression leds to make regression in data with high outliers or large leverage points, however in this specific case I encounter collinearity problem that was produced after adding these last dummie variables. Previously, I carried out same regression with other dummies with few categories and I got successfully results.

I'm gonna thank you if you can give me an additional feedback. Greetings

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Lop wrote:

Thanks #PaigeMiller for your answer and indeed I took just 12 dummie variables omiting jobdum12 as reference, but I still found same dissapointing results. And, as you mention robust regression leds to make regression in data with high outliers or large leverage points, however in this specific case I encounter collinearity problem that was produced after adding these last dummie variables. Previously, I carried out same regression with other dummies with few categories and I got successfully results.

I'm gonna thank you if you can give me an additional feedback. Greetings

You have to make this change for all of your dummy variables, not just the ones for JOB. Or better yet, do what @Rick_SAS suggested.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Dear #Rick_SAS and #PaigeMiller thanks for replying.

So far I tried the option you suggested and I got the same described problem. Perhaps it may be related to the syntaxis I am using.

/*Option 1*/

proc robustreg data=cpro.fstqwom_dum method=mm plots=all ;

model inbody_pbf=sittaqdum2 sittaqdum3 sittaqdum4 age bmi waist bsa exerc2 smokdum2 smokdum3 drinkdum2 drinkdum3

jobdum2 jobdum3 jobdum4 jobdum5 jobdum6 jobdum7 jobdum8 jobdum9 jobdum10 jobdum11 jobdum12 jobdum13 / diagnostics leverage (opc mcdinfo) ;

output out=robrefsittapbfresw5 weight=wgt ;

test sittaqdum2 sittaqdum3 ;

run ;

/*Option 2*/

proc robustreg data=cpro.fstqwom_dum method=mm plots=all ;

class sdta_quart exerc2 smoking drinking a7_1 ;

model inbody_pbf=sdta_quart age bmi waist bsa exerc2 smoking drinking a7_1 / diagnostics leverage (opc mcdinfo) ;

output out=robrefsittapbfresw5 weight=wgt ;

test sdta_quart ;

run ;

After this, let me ask you a couple of questions.

Is there a way how to control the reference level for categorical variables in robust regression? -as the same way there is in logistic regression-

Does SAS support categorical variables with many categories as in this case?

Thanks one more time for valuable insights.

Lop,

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The MM method is initialized by using the LTS method, and that is the algorithm that is failing. Try

PROC ROBUSTREG method=MM(INITEST= S) ...

and maybe the S method will converge to an initial estimate.

Alternatively, if you stick with LTS as the initializer, you can try increasing the default H (breakdown) value. The syntax is

PROC ROBUSTREG method=MM(INITEST= LTS H=0.24) ...

where the H= value depends on the size of your data.

If that doesn't work, you might need to change to the M method. I think the number of categorical variables is causing this problem. You only have 4 continuous variables whereas you have dozens of categorical levels. The algorithms for robust regression were created for continuous variables. Later people tried to extend them to support discrete variables, but as the ROBUSTREG doc says:

Note: Because the LTS and S methods use subsampling algorithms, these methods are not suitable in an analysis that uses variables that have only a few unequal values..... For example, indicator variables that correspond to a classification variable often fall into this category. The same issue also applies to the initial LTS and S estimates in the MM method. For a model that includes classification independent variables or continuous independent variables with a few unequal values, the M method is recommended.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Lop wrote:

Dear #Rick_SAS and #PaigeMiller thanks for replying.

So far I tried the option you suggested and I got the same described problem. Perhaps it may be related to the syntaxis I am using.

/*Option 1*/

proc robustreg data=cpro.fstqwom_dum method=mm plots=all ;

model inbody_pbf=sittaqdum2 sittaqdum3 sittaqdum4 age bmi waist bsa exerc2 smokdum2 smokdum3 drinkdum2 drinkdum3

jobdum2 jobdum3 jobdum4 jobdum5 jobdum6 jobdum7 jobdum8 jobdum9 jobdum10 jobdum11 jobdum12 jobdum13 / diagnostics leverage (opc mcdinfo) ;

output out=robrefsittapbfresw5 weight=wgt ;

test sittaqdum2 sittaqdum3 ;

run ;

/*Option 2*/

proc robustreg data=cpro.fstqwom_dum method=mm plots=all ;

class sdta_quart exerc2 smoking drinking a7_1 ;

model inbody_pbf=sdta_quart age bmi waist bsa exerc2 smoking drinking a7_1 / diagnostics leverage (opc mcdinfo) ;

output out=robrefsittapbfresw5 weight=wgt ;

test sdta_quart ;

run ;

Sure would be nice if you showed us the relevant portions of your SASLOG, with error message.

The problem may be that your different categorical variables are perfectly correlated with one another, and so even by reducing the number of dummy variables by 1, or by using the CLASS statement, the matrix still can't be inverted.

--

Paige Miller

Paige Miller

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.