turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Collinearity problem in robust regression

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2 weeks ago - last edited 2 weeks ago

I will appreciate if you can help me with some insights to solve this problem.

I was carrying out a robust regression with continuous and categorical variables. For this, I transformed categorical variables into dummie variables.

54 proc robustreg data=cpro.fstqwom_dum method=mm plots=all ;

55 model inbody_pbf=sdtaqdum2 sdtaqdum3 sdtaqdum4 age bmi waist bsa

56 exerc2 smokdum2 smokdum3 drinkdum2 drinkdum3 jobdum1 jobdum2 jobdum3 jobdum4 jobdum5 jobdum6

57 jobdum7 jobdum8 jobdum9 jobdum10 jobdum11 jobdum13 / diagnostics leverage (opc mcdinfo) ;

58 output out=robrefsittapbfresw5 weight=wgt ;

59 test sdtaqdum2 sdtaqdum3 sdtaqdum4 ;

60 run ;

However, after inserting additional dummies variables (corresponding to those who appear in blue), I got this message.

Furthermore, my previous modeling didn't present same problem. And, I even used the CLASS option but the problem still was present.

WARNING: The design matrix is singular. Some regressors are dropped from the matrix. LEVERAGE is being computed

on the reduced design matrix.

ERROR: The current MM estimation failed because a collinearity problem for a subset of the dataset occurred in

its initial LTS estimation.

ERROR: Initial LTS estimator cannot be computed.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2 weeks ago

If you have 13 levels of JOB, then you need 12 (not 13) variables jobdum1 through jobdum12. Similarly for your other dummy variables. This eliminates the message.

Using ROBUSTREG will not solve any colinearity problems, ROBUSTREG is effective in the presence of outliers.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

a week ago - last edited a week ago

Thanks #PaigeMiller for your answer and indeed I took just 12 dummie variables omiting jobdum12 as reference, but I still found same dissapointing results. And, as you mention robust regression leds to make regression in data with high outliers or large leverage points, however in this specific case I encounter collinearity problem that was produced after adding these last dummie variables. Previously, I carried out same regression with other dummies with few categories and I got successfully results.

I'm gonna thank you if you can give me an additional feedback. Greetings

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

a week ago

@Lop wrote:

Thanks #PaigeMiller for your answer and indeed I took just 12 dummie variables omiting jobdum12 as reference, but I still found same dissapointing results. And, as you mention robust regression leds to make regression in data with high outliers or large leverage points, however in this specific case I encounter collinearity problem that was produced after adding these last dummie variables. Previously, I carried out same regression with other dummies with few categories and I got successfully results.

I'm gonna thank you if you can give me an additional feedback. Greetings

You have to make this change for all of your dummy variables, not just the ones for JOB. Or better yet, do what @Rick_SAS suggested.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

a week ago

PROC ROBUSTREG supports a CLASS statement. This feature was introduced in SAS 9.22. I suggest you list categorical variables in the CLASS and MODEL statements instead of generating your own dummy variables.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

a week ago

Dear #Rick_SAS and #PaigeMiller thanks for replying.

So far I tried the option you suggested and I got the same described problem. Perhaps it may be related to the syntaxis I am using.

/*Option 1*/

proc robustreg data=cpro.fstqwom_dum method=mm plots=all ;

model inbody_pbf=sittaqdum2 sittaqdum3 sittaqdum4 age bmi waist bsa exerc2 smokdum2 smokdum3 drinkdum2 drinkdum3

jobdum2 jobdum3 jobdum4 jobdum5 jobdum6 jobdum7 jobdum8 jobdum9 jobdum10 jobdum11 jobdum12 jobdum13 / diagnostics leverage (opc mcdinfo) ;

output out=robrefsittapbfresw5 weight=wgt ;

test sittaqdum2 sittaqdum3 ;

run ;

/*Option 2*/

proc robustreg data=cpro.fstqwom_dum method=mm plots=all ;

class sdta_quart exerc2 smoking drinking a7_1 ;

model inbody_pbf=sdta_quart age bmi waist bsa exerc2 smoking drinking a7_1 / diagnostics leverage (opc mcdinfo) ;

output out=robrefsittapbfresw5 weight=wgt ;

test sdta_quart ;

run ;

After this, let me ask you a couple of questions.

Is there a way how to control the reference level for categorical variables in robust regression? -as the same way there is in logistic regression-

Does SAS support categorical variables with many categories as in this case?

Thanks one more time for valuable insights.

Lop,

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

a week ago

The MM method is initialized by using the LTS method, and that is the algorithm that is failing. Try

PROC ROBUSTREG method=MM(INITEST= S) ...

and maybe the S method will converge to an initial estimate.

Alternatively, if you stick with LTS as the initializer, you can try increasing the default H (breakdown) value. The syntax is

PROC ROBUSTREG method=MM(INITEST= LTS H=0.24) ...

where the H= value depends on the size of your data.

If that doesn't work, you might need to change to the M method. I think the number of categorical variables is causing this problem. You only have 4 continuous variables whereas you have dozens of categorical levels. The algorithms for robust regression were created for continuous variables. Later people tried to extend them to support discrete variables, but as the ROBUSTREG doc says:

Note: Because the LTS and S methods use subsampling algorithms, these methods are not suitable in an analysis that uses variables that have only a few unequal values..... For example, indicator variables that correspond to a classification variable often fall into this category. The same issue also applies to the initial LTS and S estimates in the MM method. For a model that includes classification independent variables or continuous independent variables with a few unequal values, the M method is recommended.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

a week ago - last edited a week ago

@Lop wrote:

Dear #Rick_SAS and #PaigeMiller thanks for replying.

So far I tried the option you suggested and I got the same described problem. Perhaps it may be related to the syntaxis I am using.

/*Option 1*/

proc robustreg data=cpro.fstqwom_dum method=mm plots=all ;

model inbody_pbf=sittaqdum2 sittaqdum3 sittaqdum4 age bmi waist bsa exerc2 smokdum2 smokdum3 drinkdum2 drinkdum3

jobdum2 jobdum3 jobdum4 jobdum5 jobdum6 jobdum7 jobdum8 jobdum9 jobdum10 jobdum11 jobdum12 jobdum13 / diagnostics leverage (opc mcdinfo) ;

output out=robrefsittapbfresw5 weight=wgt ;

test sittaqdum2 sittaqdum3 ;

run ;

/*Option 2*/

proc robustreg data=cpro.fstqwom_dum method=mm plots=all ;

class sdta_quart exerc2 smoking drinking a7_1 ;

model inbody_pbf=sdta_quart age bmi waist bsa exerc2 smoking drinking a7_1 / diagnostics leverage (opc mcdinfo) ;

output out=robrefsittapbfresw5 weight=wgt ;

test sdta_quart ;

run ;

Sure would be nice if you showed us the relevant portions of your SASLOG, with error message.

The problem may be that your different categorical variables are perfectly correlated with one another, and so even by reducing the number of dummy variables by 1, or by using the CLASS statement, the matrix still can't be inverted.

--

Paige Miller

Paige Miller