turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Regression with large number of fixed effects in a...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-20-2014 09:23 PM

I would like to run a regression that includes about 2500 dummy variables (or fixed effects). The data set includes about 450,000 observations, and it is very sparse: most observations only have one or two effects "turned on" -- in other words, only about 0.05% of the design matrix are ones.

(Interestingly, when I created this matrix in SAS 9.4 on a Windows machine it created a file that was about 4.5GB. When I transferred it to Unix it turned into a 30MB file. I was surprised that whatever magic sauce SAS is using to store the sparse matrix on Unix it isn't using on Windows.)

I'm wondering what the best way to estimate a model like this. Here are some possibilities that I'm aware of, and I'm looking for guidance on what is likely to be the most efficient approach:

1) Use **proc hpmixed**. Given the sparse nature of the data this seemed like a good way to go. But I've been running this model for 11 hours and it hasn't finished yet. I'm wondering if perhaps I've implemented it wrong. My code is:

proc hpmixed;

class fid;

model r = size fid dummy1-dummy2500;

run;

2) Use **IML**. I thought perhaps I could read in the sparse matrix to IML and use **solvelin** to estimate the coefficients.

Is one of these likely to be the best approach? Are there other procedures that would work well?

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to stoffprof

04-21-2014 12:07 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

04-21-2014 09:45 AM

Thanks, but it seems that this is only about sparse matrices with text mining or Enterprise Miner tool. I haven't seen anything about use in a standard regression.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to stoffprof

04-22-2014 09:44 AM

You should try HPREG procedure. This is designed specifically for high dimensional fixed-effects modeling. It is only found in the newer releases of sas.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-23-2014 03:48 PM

I would go with the first inclination towards HPMIXED, which employs sparse matrix algorithms. I have not tried HPREG, but the documentation for yet another high performance proc (HPLMIXED) indicates that HPMIXED "is particularly suited for problems in which the [**XZ**]'[**XZ**] crossproducts matrix is sparse." And that sounds exactly like what is going on here. And while HPREG offers a lot of capability, it looks like it depends more on multithreading/parallel processing than on sparse matrix techniques.

My question is--dummy1 to dummy2500 seems difficult. Are these dummies the result of more easily defined class variables, such that you can use the class statement to "auto-populate" the levels? If not, and the data set is already prepped, I would go with your first inclination.

Steve Denham