turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- How to include FIXED effect when using zero-trunca...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-04-2013 01:07 AM

Dear All:

I need to use **zero-truncated negative binomial regression** to analyze a panel data. Is there a way to include **fixed-effect** for models like these? Thanks -

The panel data is similar to library book finder system's webvisit log. See the following table. For a given time period, I can calculate for each guest, how many times he or she browsed for the books by a given author. For example, I can calculate guest_A accessed the library book main page for author2 for 4 times during 2008Q1. This would be one observation in my regression. The reason why I was trying to use zero-truncated count model is because the computer system won't even register a visit history until a guest actually visited the library book finder system. I don't have the whole population, because a guest can visit other library or they are not interested in reading books at all. Therefore, the minimum count is 1.

There are close to 1 mil obs in my sample. I also included a proc freq output in the table below.

guest_ID | Author_ID | book_type | checkout_date |
---|---|---|---|

guest_A | author1 | A | 01012008 |

guest_A | author1 | B | 01032009 |

guest_A | author2 | A | 12112009 |

guest_J | author2 | A | 10112006 |

guest_J | author232 | C | 10132007 |

guest_K | author232 | C | 07092009 |

guest_K | author243 | E | 05082009 |

guest_T | author9870 | G | 01022007 |

guest_T | author9870 | G | 01032008 |

guest_T | author9871 | G | 01042010 |

guest_E | author9871 | A | 01022011 |

guest_E | author9871 | A | 03022011 |

guest_E | author9871 | A | 02302011 |

guest_E | author98790 | K | 03302011 |

guest_E | author98790 | D | 04202011 |

guest_E | author98795 | E | 04212011 |

count | Frequency | Percent |

1 | 515811 | 56.46 |

2 | 143579 | 15.71 |

3 | 70331 | 7.7 |

4 | 39976 | 4.38 |

5 | 25729 | 2.82 |

6 | 18988 | 2.08 |

7 | 14010 | 1.53 |

8 | 10803 | 1.18 |

9 | 8688 | 0.95 |

10 | 7009 | 0.77 |

11 | 5738 | 0.63 |

12 | 4544 | 0.5 |

13 | 3907 | 0.43 |

14 | 3483 | 0.38 |

15 | 2867 | 0.31 |

16 | 2523 | 0.28 |

17 | 2364 | 0.26 |

18 | 2011 | 0.22 |

19 | 1682 | 0.18 |

20 | 1554 | 0.17 |

21 | 1432 | 0.16 |

22 | 1203 | 0.13 |

23 | 1103 | 0.12 |

24 | 1051 | 0.12 |

25 | 920 | 0.1 |

26 | 810 | 0.09 |

27 | 757 | 0.08 |

28 | 770 | 0.08 |

29 | 609 | 0.07 |

30 | 630 | 0.07 |

31 | 544 | 0.06 |

32 | 542 | 0.06 |

33 | 516 | 0.06 |

34 | 473 | 0.05 |

35 | 436 | 0.05 |

36 | 420 | 0.05 |

37 | 372 | 0.04 |

38 | 452 | 0.05 |

39 | 346 | 0.04 |

40 | 334 | 0.04 |

41 | 327 | 0.04 |

42 | 282 | 0.03 |

...

1265 | 1 | 0 |

1291 | 1 | 0 |

1344 | 1 | 0 |

1347 | 1 | 0 |

1350 | 1 | 0 |

1548 | 1 | 0 |

1621 | 1 | 0 |

1622 | 1 | 0 |

1751 | 1 | 0 |

1884 | 1 | 0 |

1895 | 1 | 0 |

2150 | 1 | 0 |

2586 | 1 | 0 |

3703 | 1 | 0 |

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to caveman529

08-05-2013 09:55 AM

If by zero truncated you mean there are no counts equal to zero by design, then I think what you might have to do is log transform the counts before using PROC PANEL. I don't see a way to specify a distribution in the documentation.

However, I may be jumping the gun. What are the panel data? There may be a way to specify a model in PROC QLIM that could address this.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

08-05-2013 04:41 PM

Hi, Steve:

Thank you for your suggestion. I updated some information on my question above. Could you share some of the insights you might have for dealing with this situation? I hope to include fixed effects into my model in order to mitigate omitted variable bias.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to caveman529

08-06-2013 09:49 AM

I come from a different statistical background, so you will have to help me out some. Why is this considered panel data? I may be missing something in the proc freq output--is this ignoring time periods? Is the presence of time as a factor what makes this panel data? If so, then a repeated measures in time using TCOUNTREG might be an option, as might GLIMMIX, using a gamma distribution.

Sounds like QLIM is not the way to go, if time is a factor, and there is correlation in the counts between time intervals for the subjects. It just doesn't address repeated measures well (yet).

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

08-06-2013 01:49 PM

HI, Steve:

Many thanks for your valuable suggestions. I calculated the count myself as below and use the count as a measure of readers' interest in the work of a particular author. table 1 is the first table in this thread.

data table2;

set table1;

year = year(date);

qtr = qtr(date);

yearqtr = compress(compress(year) || 'Q' || compress(qtr));

run;

/* Calculate the number of times a guest was search for books by a particular author (i don't care about what kind of books at this point) */

proc sql;

create table table3 as

select distinct guest_id, yearqtr, author_id, count(*) as count

from table2

group by guest_id, yearqtr, author_id

order by guest_id, yearqtr, author_id;

quit;

/* yearqtr ignored for brevity*/

proc freq data=table3;

table count ;

run;

Sorry about this confusion. I checked out TCOUNTREG manual for 9.4, it doesn't seem to cover model of fixed effects for panel count data in which 0s are truncated through.... Could you expand a little bit on using Gamma distribution in this situation? Thanks -

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to caveman529

08-07-2013 01:47 PM

Well, the gamma distribution is just a continuous distribution with a skew, which is what your data shows. Your data would be an almost perfect fit to an exponential distribution if the count variable was replaced by count-1, and the exponential is a special case of the gamma. But you are lucky in a sense, because GLIMMIX requires a link of some kind between the original scale and the model scale, and for a gamma this is a log function. So you are defined on all parts.

The following fits a fixed repeated effect of timeperiod with a response of count, summarized by timeperiod and guest_id.

Starter code:

proc glimmix data=yourdatasummarizedbytimeperiodandguest_id;

class timeperiod guest_id;

model count = timeperiod / dist=gamma; /*this could be dist=negbin for a negative binomial*/

random timeperiod/subject=guest_id type=cs;/*I am going with CS here, only because I don't know the time spacing, or much about the process generating the data*/

run;

Steve Denham

Message was edited by: Steve Denham

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

08-07-2013 05:37 PM

Thank you so much for your suggestions, Steve. They are very helpful. I'll give it a try!