How to include FIXED effect when using zero-truncated negative binomial regression on panel data?

Reply
Regular Contributor
Posts: 161

How to include FIXED effect when using zero-truncated negative binomial regression on panel data?

Dear All:

I need to use zero-truncated negative binomial regression to analyze a panel data.  Is there a way to include fixed-effect for models like these?  Thanks -

The panel data is similar to library book finder system's webvisit log.  See the following table.  For a given time period, I can calculate for each guest, how many times he or she browsed for the books by a given author.  For example, I can calculate guest_A accessed the library book main page for author2 for 4 times during 2008Q1.  This would be one observation in my regression.  The reason why I was trying to use zero-truncated count model is because the computer system won't even register a visit history until a guest actually visited the library book finder system.  I don't have the whole population, because a guest can visit other library or they are not interested in reading books at all.  Therefore, the minimum count is 1. 

There are close to 1 mil obs in my sample.  I also included a proc freq output in the table below. 

guest_IDAuthor_IDbook_typecheckout_date
guest_Aauthor1A01012008
guest_Aauthor1B01032009
guest_Aauthor2A12112009
guest_Jauthor2A10112006
guest_Jauthor232C10132007
guest_Kauthor232C07092009
guest_Kauthor243E05082009
guest_Tauthor9870G01022007
guest_Tauthor9870G01032008
guest_Tauthor9871G01042010
guest_Eauthor9871A01022011
guest_Eauthor9871A03022011
guest_Eauthor9871A02302011
guest_Eauthor98790K03302011
guest_Eauthor98790D04202011
guest_Eauthor98795E04212011

count

FrequencyPercent
151581156.46
214357915.71
3703317.7
4399764.38
5257292.82
6189882.08
7140101.53
8108031.18
986880.95
1070090.77
1157380.63
1245440.5
1339070.43
1434830.38
1528670.31
1625230.28
1723640.26
1820110.22
1916820.18
2015540.17
2114320.16
2212030.13
2311030.12
2410510.12
259200.1
268100.09
277570.08
287700.08
296090.07
306300.07
315440.06
325420.06
335160.06
344730.05
354360.05
364200.05
373720.04
384520.05
393460.04
403340.04
413270.04
422820.03

...

126510
129110
134410
134710
135010
154810
162110
162210
175110
188410
189510
215010
258610
370310
Respected Advisor
Posts: 2,655

Re: How to include FIXED effect when using zero-truncated negative binomial regression on panel data?

Posted in reply to caveman529

If by zero truncated you mean there are no counts equal to zero by design, then I think what you might have to do is log transform the counts before using PROC PANEL.  I don't see a way to specify a distribution in the documentation.

However, I may be jumping the gun.  What are the panel data?  There may be a way to specify a model in PROC QLIM that could address this.

Steve Denham

Regular Contributor
Posts: 161

Re: How to include FIXED effect when using zero-truncated negative binomial regression on panel data?

Posted in reply to SteveDenham

Hi, Steve:

Thank you for your suggestion.  I updated some information on my question above.  Could you share some of the insights you might have for dealing with this situation?  I hope to include fixed effects into my model in order to mitigate omitted variable bias.

Respected Advisor
Posts: 2,655

Re: How to include FIXED effect when using zero-truncated negative binomial regression on panel data?

Posted in reply to caveman529

I come from a different statistical background, so you will have to help me out some.  Why is this considered panel data?  I may be missing something in the proc freq output--is this ignoring time periods?  Is the presence of time as a factor what makes this panel data?  If so, then a repeated measures in time using TCOUNTREG might be an option, as might GLIMMIX, using a gamma distribution.

Sounds like QLIM is not the way to go, if time is a factor, and there is correlation in the counts between time intervals for the subjects.  It just doesn't address repeated measures well (yet).

Steve Denham

Regular Contributor
Posts: 161

Re: How to include FIXED effect when using zero-truncated negative binomial regression on panel data?

Posted in reply to SteveDenham

HI, Steve:

Many thanks for your valuable suggestions.  I calculated the count myself as below and use the count as a measure of readers' interest in the work of a particular author.  table 1 is the first table in this thread.

data table2;

  set table1;

  year = year(date);

  qtr = qtr(date);

  yearqtr = compress(compress(year) || 'Q' || compress(qtr));

run;

/* Calculate the number of times a guest was search for books by a particular author (i don't care about what kind of books at this point) */

proc sql;

  create table table3 as

  select distinct guest_id, yearqtr, author_id, count(*) as count

  from table2

  group by guest_id, yearqtr, author_id

  order by guest_id, yearqtr, author_id;

quit;

/* yearqtr ignored for brevity*/

proc freq data=table3;

  table count ;

run;

Sorry about this confusion.  I checked out TCOUNTREG manual for 9.4, it doesn't seem to cover model of fixed effects for panel count data in which 0s are truncated through....  Could you expand a little bit on using Gamma distribution in this situation?  Thanks -

Respected Advisor
Posts: 2,655

Re: How to include FIXED effect when using zero-truncated negative binomial regression on panel data?

Posted in reply to caveman529

Well, the gamma distribution is just a continuous distribution with a skew, which is what your data shows.  Your data would be an almost perfect fit to an exponential distribution if the count variable was replaced by count-1, and the exponential is a special case of the gamma.  But you are lucky in a sense, because GLIMMIX requires a link of some kind between the original scale and the model scale, and for a gamma this is a log function.  So you are defined on all parts.

The following fits a fixed repeated effect of timeperiod with a response of count, summarized by timeperiod and guest_id.

Starter code:

proc glimmix data=yourdatasummarizedbytimeperiodandguest_id;

class timeperiod guest_id;

model count = timeperiod / dist=gamma; /*this could be dist=negbin for a negative binomial*/

random timeperiod/subject=guest_id type=cs;/*I am going with CS here, only because I don't know the time spacing, or much about the process generating the data*/

run;

Steve Denham

Message was edited by: Steve Denham

Regular Contributor
Posts: 161

Re: How to include FIXED effect when using zero-truncated negative binomial regression on panel data?

Posted in reply to SteveDenham

Thank you so much for your suggestions, Steve.  They are very helpful.  I'll give it a try!  Smiley Happy

Ask a Question
Discussion stats
  • 6 replies
  • 414 views
  • 6 likes
  • 2 in conversation