I am very new to SAS and need some help fitting a poisson model first, and in case of overdispersion, fitting a negative binomial mixed model for the following context:
I want to isolate gender differences in the preferences for reports written by others using language that is high on several pre-determined categories (e.g., personal pronouns). I have raw counts of words in the report that fall into each of the various language categories as well as total number of words in the report.
I understand that I have to use mixed Poisson/Negative binomial regression models as in my data - (1) I have multiple reports bought by the same person; and (2) the same report may be bought by multiple people;
I also need to control for the random effects of two categorical variables: country and industry.
How do I fit a mixed model for this situation with a count dependent variable (raw count of personal pronouns in the report), offset parameter (total number of words in the report), gender of the rater (m=0; f=1); and four nestings (raterID; reportID; countryCode; industryCode)? How do I know if Poisson or negative binomial mixed model fits better based on regression results?
I understand that I need to use nlmixed. But I don't understand how to use it for my situation.I would really appreciate your kind help.
Unless you are looking at non-linear models, you should be looking at proc glimmix, it does support Poisson and negative binomial distributed responses with offset.
I tried the following code with GLIMMIX
proc glimmix data=myData method=quad;
class GenderRater RaterID;
model RAW_COUNT_PRONOUNS =GenderRater / link=log s dist=negbinomial offset=log_TOTAL_WORDS;
random int / subject=RaterID;
run;
Within a second it gives the following error: The SAS system stopped processing this step because of insufficient memory
I just have 50,000 observations and 1500 Raters. What am I getting wrong?
I also tried NLMIXED and it gives no results, despite no error -
proc nlmixed data=myData;
xb = b0 + b1*GenderRater + u;
mu = exp(xb +log_TOTAL_WORDS);
m = 1/alpha;
ll = lgamma(RAW_COUNT_PRONOUNS+m)-lgamma(RAW_COUNT_PRONOUNS+1)-lgamma(m)
+RAW_COUNT_PRONOUNS*log(alpha*mu)-(RAW_COUNT_PRONOUNS+m)*log(1+alpha*mu);
model RAW_COUNT_PRONOUNS ~ general(ll);
random u ~ normal(0,s2u) subject=RaterID;
run;
I was able to get NLMIXED running. The error was due to gender variable. It worked after dummy coding it. However, GLIMMIX still gives the same memory error.
Unfortunately, even the method=quad(fastquad qpoints=3) option gives the same memory error.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.