<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Regression with large number of fixed effects in a sparse matrix in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Regression-with-large-number-of-fixed-effects-in-a-sparse-matrix/m-p/186977#M9718</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You should try HPREG procedure. This is designed specifically for high dimensional fixed-effects modeling. It is only found in the newer releases of sas.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 22 Apr 2014 13:44:57 GMT</pubDate>
    <dc:creator>lvm</dc:creator>
    <dc:date>2014-04-22T13:44:57Z</dc:date>
    <item>
      <title>Regression with large number of fixed effects in a sparse matrix</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Regression-with-large-number-of-fixed-effects-in-a-sparse-matrix/m-p/186974#M9715</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I would like to run a regression that includes about 2500 dummy variables (or fixed effects). The data set includes about 450,000 observations, and it is very sparse: most observations only have one or two effects "turned on" -- in other words, only about 0.05% of the design matrix are ones.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;(Interestingly, when I created this matrix in SAS 9.4 on a Windows machine it created a file that was about 4.5GB. When I transferred it to Unix it turned into a 30MB file. I was surprised that whatever magic sauce SAS is using to store the sparse matrix on Unix it isn't using on Windows.)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm wondering what the best way to estimate a model like this. Here are some possibilities that I'm aware of, and I'm looking for guidance on what is likely to be the most efficient approach:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;1) Use &lt;STRONG&gt;proc hpmixed&lt;/STRONG&gt;. Given the sparse nature of the data this seemed like a good way to go. But I've been running this model for 11 hours and it hasn't finished yet. I'm wondering if perhaps I've implemented it wrong. My code is:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="padding-left: 60px;"&gt;&lt;SPAN style="font-family: 'courier new', courier;"&gt;proc hpmixed;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 60px;"&gt;&lt;SPAN style="font-family: 'courier new', courier;"&gt;&amp;nbsp; class fid;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 60px;"&gt;&lt;SPAN style="font-family: 'courier new', courier;"&gt;&amp;nbsp; model r = size fid dummy1-dummy2500;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 60px;"&gt;&lt;SPAN style="font-family: 'courier new', courier;"&gt;run;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;2) Use &lt;STRONG&gt;IML&lt;/STRONG&gt;. I thought perhaps I could read in the sparse matrix to IML and use &lt;STRONG&gt;solvelin&lt;/STRONG&gt; to estimate the coefficients.&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;/P&gt;&lt;P&gt;Is one of these likely to be the best approach? Are there other procedures that would work well?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 21 Apr 2014 01:23:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Regression-with-large-number-of-fixed-effects-in-a-sparse-matrix/m-p/186974#M9715</guid>
      <dc:creator>stoffprof</dc:creator>
      <dc:date>2014-04-21T01:23:16Z</dc:date>
    </item>
    <item>
      <title>Re: Regression with large number of fixed effects in a sparse matrix</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Regression-with-large-number-of-fixed-effects-in-a-sparse-matrix/m-p/186975#M9716</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;There's an article on here in the past week about using sparse matrices and how to use them.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Unfortunately I can't find the link but if you look through the past two weeks I'm sure you'll find it.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;EDIT: Found the link:&lt;/P&gt;&lt;P&gt;&lt;A __default_attr="5323" __jive_macro_name="document" class="jive_macro jive_macro_document" href="https://communities.sas.com/"&gt;&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 21 Apr 2014 04:07:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Regression-with-large-number-of-fixed-effects-in-a-sparse-matrix/m-p/186975#M9716</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2014-04-21T04:07:42Z</dc:date>
    </item>
    <item>
      <title>Re: Regression with large number of fixed effects in a sparse matrix</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Regression-with-large-number-of-fixed-effects-in-a-sparse-matrix/m-p/186976#M9717</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks, but it seems that this is only about sparse matrices with text mining or Enterprise Miner tool. I haven't seen anything about use in a standard regression.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 21 Apr 2014 13:45:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Regression-with-large-number-of-fixed-effects-in-a-sparse-matrix/m-p/186976#M9717</guid>
      <dc:creator>stoffprof</dc:creator>
      <dc:date>2014-04-21T13:45:48Z</dc:date>
    </item>
    <item>
      <title>Re: Regression with large number of fixed effects in a sparse matrix</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Regression-with-large-number-of-fixed-effects-in-a-sparse-matrix/m-p/186977#M9718</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You should try HPREG procedure. This is designed specifically for high dimensional fixed-effects modeling. It is only found in the newer releases of sas.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 22 Apr 2014 13:44:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Regression-with-large-number-of-fixed-effects-in-a-sparse-matrix/m-p/186977#M9718</guid>
      <dc:creator>lvm</dc:creator>
      <dc:date>2014-04-22T13:44:57Z</dc:date>
    </item>
    <item>
      <title>Re: Regression with large number of fixed effects in a sparse matrix</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Regression-with-large-number-of-fixed-effects-in-a-sparse-matrix/m-p/186978#M9719</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I would go with the first inclination towards HPMIXED, which employs sparse matrix algorithms.&amp;nbsp; I have not tried HPREG, but the documentation for yet another high performance proc (HPLMIXED) indicates that HPMIXED "is particularly suited for problems in which the [&lt;STRONG&gt;XZ&lt;/STRONG&gt;]'[&lt;STRONG&gt;XZ&lt;/STRONG&gt;] crossproducts matrix is sparse."&amp;nbsp; And that sounds exactly like what is going on here.&amp;nbsp; And while HPREG offers a lot of capability, it looks like it depends more on multithreading/parallel processing than on sparse matrix techniques.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My question is--dummy1 to dummy2500 seems difficult.&amp;nbsp; Are these dummies the result of more easily defined class variables, such that you can use the class statement to "auto-populate" the levels?&amp;nbsp; If not, and the data set is already prepped, I would go with your first inclination.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve Denham&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 23 Apr 2014 19:48:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Regression-with-large-number-of-fixed-effects-in-a-sparse-matrix/m-p/186978#M9719</guid>
      <dc:creator>SteveDenham</dc:creator>
      <dc:date>2014-04-23T19:48:22Z</dc:date>
    </item>
  </channel>
</rss>

