<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Regression: Binary Response and 100,000s of Class Levels in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Regression-Binary-Response-and-100-000s-of-Class-Levels/m-p/930577#M46379</link>
    <description>&lt;P&gt;You can fit a &amp;nbsp;fixed effects, conditional logistic model in PROC LOGISTIC by putting your account number variable in the STRATA statement. This conditions out those parameters from the likelihood so that they are not estimated. Another option is using a GEE model by specifying that variable in the SUBJECT= option in the REPEATED statement in PROC GEE. Repeated measurements on the levels of the variable are not required to use the GEE method.&lt;/P&gt;</description>
    <pubDate>Sun, 02 Jun 2024 21:37:08 GMT</pubDate>
    <dc:creator>StatDave</dc:creator>
    <dc:date>2024-06-02T21:37:08Z</dc:date>
    <item>
      <title>Regression: Binary Response and 100,000s of Class Levels</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Regression-Binary-Response-and-100-000s-of-Class-Levels/m-p/930572#M46376</link>
      <description>&lt;P&gt;I am running a regression having a binary response variable and need to estimate fixed effects for hundreds of thousands of class levels (eleven_digit_account_id). Each code below produces hundreds of thousands of coefficients, one for each class level, overwhelming system resources. Is there a way to suppress class-level coefficients in any of these procs or is there another proc that can handle a binary response variable with hundreds of thousands of class levels?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;proc&lt;/STRONG&gt; &lt;STRONG&gt;glimmix&lt;/STRONG&gt; data=summary_statistics NOCLPRINT;&lt;/P&gt;&lt;P&gt;class eleven_digit_account_id;&lt;/P&gt;&lt;P&gt;model bet_win =&amp;nbsp; net_stake last_round_profit miles_fan_bet_team eleven_digit_account_id/noint solution dist=bin link=logit;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;quit&lt;/STRONG&gt;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;proc&lt;/STRONG&gt; &lt;STRONG&gt;genmod&lt;/STRONG&gt; data=summary_statistics descending;&lt;/P&gt;&lt;P&gt;class eleven_digit_account_id;&lt;/P&gt;&lt;P&gt;model bet_win =&amp;nbsp; net_stake last_round_profit miles_fan_bet_team eleven_digit_account_id/noint dist=bin link=logit;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;run&lt;/STRONG&gt;;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;quit&lt;/STRONG&gt;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;proc&lt;/STRONG&gt; &lt;STRONG&gt;logistic&lt;/STRONG&gt; data=summary_statistics;&lt;/P&gt;&lt;P&gt;class eleven_digit_account_id;&lt;/P&gt;&lt;P&gt;model bet_win (event='1') = net_stake last_round_profit miles_fan_bet_team eleven_digit_account_id/noint;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;run&lt;/STRONG&gt;;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;quit&lt;/STRONG&gt;;&lt;/P&gt;</description>
      <pubDate>Sun, 02 Jun 2024 20:33:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Regression-Binary-Response-and-100-000s-of-Class-Levels/m-p/930572#M46376</guid>
      <dc:creator>rick_b</dc:creator>
      <dc:date>2024-06-02T20:33:04Z</dc:date>
    </item>
    <item>
      <title>Re: Regression: Binary Response and 100,000s of Class Levels</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Regression-Binary-Response-and-100-000s-of-Class-Levels/m-p/930574#M46377</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/465881"&gt;@rick_b&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I am running a regression having a binary response variable and need to estimate fixed effects for hundreds of thousands of class levels (eleven_digit_account_id).&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Right here I am skeptical, I don't really think this is a good thing to do. Why do you need fixed effects for each level of account_id?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Each code below produces hundreds of thousands of coefficients, one for each class level, overwhelming system resources.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Overwhelming disk space? Or overwhelming memory? Or something else? What SAS are you using anyway? Viya, Base SAS, something else?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There is a High Performance version of PROC LOGISTIC, its called PROC HPLOGISTIC. As I understand it, this speeds up the calculations by using distributed processing. As far as I know, it doesn't really overcome a limitation of system resources.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So we are back to my first question, why do you need each of the 100,000 account id to be treated individually in the model? How does this improve the model?&lt;/P&gt;</description>
      <pubDate>Sun, 02 Jun 2024 20:42:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Regression-Binary-Response-and-100-000s-of-Class-Levels/m-p/930574#M46377</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2024-06-02T20:42:12Z</dc:date>
    </item>
    <item>
      <title>Re: Regression: Binary Response and 100,000s of Class Levels</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Regression-Binary-Response-and-100-000s-of-Class-Levels/m-p/930575#M46378</link>
      <description>&lt;P&gt;Finance referees nearly always expect fixed effects. You have probably seen many SAS Community examples where finance researchers use firm-fixed effects (typically each unique firm is identified by its gvkey). Here, eleven_digit_account_id is a unique customer ID much like gvkey is a unique company ID. In the past when studying firms I have used proc glm in combination with the absorb statement, but my understanding is that glm isn't designed to handle binary responses.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using SAS 9.4. Without all those unique IDs (for example if I stick to proc logistic with a strata eleven_digit_account_id statement) the regression takes about 2.5 hours to run. When eleven_digit_account_id is included in the class and model statements, my computer stops responding. It was my understanding that in proc logistic strata can be used to specify fixed effects (&lt;A href="https://communities.sas.com/t5/Statistical-Procedures/Suitable-quot-proc-quot-for-a-model-with-Dummy-dependent-and/td-p/541784" target="_blank"&gt;https://communities.sas.com/t5/Statistical-Procedures/Suitable-quot-proc-quot-for-a-model-with-Dummy-dependent-and/td-p/541784&lt;/A&gt;), but in smaller subsample tests I get different results when I use the strata statement vs. when I use class and place the variable in the model statement, so I question whether it is the same.&lt;/P&gt;</description>
      <pubDate>Sun, 02 Jun 2024 21:26:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Regression-Binary-Response-and-100-000s-of-Class-Levels/m-p/930575#M46378</guid>
      <dc:creator>rick_b</dc:creator>
      <dc:date>2024-06-02T21:26:23Z</dc:date>
    </item>
    <item>
      <title>Re: Regression: Binary Response and 100,000s of Class Levels</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Regression-Binary-Response-and-100-000s-of-Class-Levels/m-p/930577#M46379</link>
      <description>&lt;P&gt;You can fit a &amp;nbsp;fixed effects, conditional logistic model in PROC LOGISTIC by putting your account number variable in the STRATA statement. This conditions out those parameters from the likelihood so that they are not estimated. Another option is using a GEE model by specifying that variable in the SUBJECT= option in the REPEATED statement in PROC GEE. Repeated measurements on the levels of the variable are not required to use the GEE method.&lt;/P&gt;</description>
      <pubDate>Sun, 02 Jun 2024 21:37:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Regression-Binary-Response-and-100-000s-of-Class-Levels/m-p/930577#M46379</guid>
      <dc:creator>StatDave</dc:creator>
      <dc:date>2024-06-02T21:37:08Z</dc:date>
    </item>
  </channel>
</rss>

