BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
rick_b
Fluorite | Level 6

I am running a regression having a binary response variable and need to estimate fixed effects for hundreds of thousands of class levels (eleven_digit_account_id). Each code below produces hundreds of thousands of coefficients, one for each class level, overwhelming system resources. Is there a way to suppress class-level coefficients in any of these procs or is there another proc that can handle a binary response variable with hundreds of thousands of class levels?

 

proc glimmix data=summary_statistics NOCLPRINT;

class eleven_digit_account_id;

model bet_win =  net_stake last_round_profit miles_fan_bet_team eleven_digit_account_id/noint solution dist=bin link=logit;

run;

quit;

 

 

proc genmod data=summary_statistics descending;

class eleven_digit_account_id;

model bet_win =  net_stake last_round_profit miles_fan_bet_team eleven_digit_account_id/noint dist=bin link=logit;

run;

quit;

 

 

proc logistic data=summary_statistics;

class eleven_digit_account_id;

model bet_win (event='1') = net_stake last_round_profit miles_fan_bet_team eleven_digit_account_id/noint;

run;

quit;

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

You can fit a  fixed effects, conditional logistic model in PROC LOGISTIC by putting your account number variable in the STRATA statement. This conditions out those parameters from the likelihood so that they are not estimated. Another option is using a GEE model by specifying that variable in the SUBJECT= option in the REPEATED statement in PROC GEE. Repeated measurements on the levels of the variable are not required to use the GEE method.

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

@rick_b wrote:

I am running a regression having a binary response variable and need to estimate fixed effects for hundreds of thousands of class levels (eleven_digit_account_id).


Right here I am skeptical, I don't really think this is a good thing to do. Why do you need fixed effects for each level of account_id?

 

Each code below produces hundreds of thousands of coefficients, one for each class level, overwhelming system resources.

 

Overwhelming disk space? Or overwhelming memory? Or something else? What SAS are you using anyway? Viya, Base SAS, something else?

 

There is a High Performance version of PROC LOGISTIC, its called PROC HPLOGISTIC. As I understand it, this speeds up the calculations by using distributed processing. As far as I know, it doesn't really overcome a limitation of system resources.

 

So we are back to my first question, why do you need each of the 100,000 account id to be treated individually in the model? How does this improve the model?

--
Paige Miller
rick_b
Fluorite | Level 6

Finance referees nearly always expect fixed effects. You have probably seen many SAS Community examples where finance researchers use firm-fixed effects (typically each unique firm is identified by its gvkey). Here, eleven_digit_account_id is a unique customer ID much like gvkey is a unique company ID. In the past when studying firms I have used proc glm in combination with the absorb statement, but my understanding is that glm isn't designed to handle binary responses. 

 

I am using SAS 9.4. Without all those unique IDs (for example if I stick to proc logistic with a strata eleven_digit_account_id statement) the regression takes about 2.5 hours to run. When eleven_digit_account_id is included in the class and model statements, my computer stops responding. It was my understanding that in proc logistic strata can be used to specify fixed effects (https://communities.sas.com/t5/Statistical-Procedures/Suitable-quot-proc-quot-for-a-model-with-Dummy...), but in smaller subsample tests I get different results when I use the strata statement vs. when I use class and place the variable in the model statement, so I question whether it is the same.

StatDave
SAS Super FREQ

You can fit a  fixed effects, conditional logistic model in PROC LOGISTIC by putting your account number variable in the STRATA statement. This conditions out those parameters from the likelihood so that they are not estimated. Another option is using a GEE model by specifying that variable in the SUBJECT= option in the REPEATED statement in PROC GEE. Repeated measurements on the levels of the variable are not required to use the GEE method.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 429 views
  • 6 likes
  • 3 in conversation