topic Re: Genmod for APC models - data distribution in Statistical Procedures

Genmod for APC models - data distribution

krejcia — Sat, 25 Apr 2015 20:05:56 GMT

Hello,

I have questions about Age-Period-Cohort models in SAS. I used proc genmod for age-specific fertility rates (Births/Exposure Female Population). I need to identify what effects are more important and how they changed fertility rates.The results are risk ratios. Explaining variables are mother’s age, Period (4 periods of 5 years length and 1 period of 4 years) and Cohort. Because of identification problem I made a constraint on Period (and chose that period 2012=reference period 2007.5). I overwrote it in the dataset (you can see it in Excel).

The formula of reggression is: ln(r_{ap) = intercept + α_{a +}β_{p +}γ_{c +}ε}if I use Poisson distribution. But I cannot use Poisson distribution because of overdispersion. So I tried negative binomial distribution. The formula for binomial distribution should then be ln(r_ap/(1-r_ap)_{) = intercept + α_{a +}β_{p +}γ_{c +}ε}but I really don't know if and how I should set it in the SAS code. Or is it adjusted?

proc genmod data=genmod.input;

class Age (ref='32.5') Period (ref='2007.5') Cohort (ref='1975') /param=ref;

model Birth = Age Period Cohort / link=log dist=nb offset=log_Exposure type3;

run;

If I work on the assumption that the code is correct (I mean in part model Birth and offset=) I have another problem. Even though I use negative binomial distribution, data are also overdispersed when all variables are used.

Could you please advise me?

Thank you very much in advance.

Anna K.

Re: Genmod for APC models - data distribution

jeonghyunwoo — Sat, 31 Oct 2015 06:04:27 GMT

Re: Genmod for APC models - data distribution

JacobSimonsen — Sun, 01 Nov 2015 23:11:40 GMT

You should not worry about overdispersion in such a model. Poisson regression can be used because the likelihood function (as a function of parameters) is the same as what you would have if data was poisson distributed. But the original data is time-to-event data, and only because of a kind of mathematical luck, the parameters can be estimated by use of Poisson regression. That means, you can make statistical inference about parameters with use of Poisson-regression, but you can not use the statistics about how good your data is Poisson distributed.

For example, if you have N observations with time to event and you will estimate the intensity, then you put N on the left side (which is of course not Poisson distributed) and the log of the total observation time in offset. It would of course then be meaningless to test how good N is Poisson distributed since it is a fixed number. That example is ofcourse rather extreme, but it illustrate the principle.

Re: Genmod for APC models - data distribution

JacobSimonsen — Fri, 12 Feb 2016 16:11:44 GMT

There are much bigger problem in this model than overdispersion. As mentioned, overdispersion is not a problem at all.

The estimates from this model will unfortunately be complete meaningless! The problem is that age, period and cohort is linearly dependent. If forexample you want period estimates, then you can not either identify the an intercept or a linear trend. The intercept may not be a problem as you may want to estimate the difference between periods. But the linear trend is a bigger problem.

It is rather hard to solve the problem. What you should do is to make a projection of the column vectors in your design matrix that comes from cohort effect (still assuming you want period estimates). Then projecting these collumns into the orthogonal space to the space spanned by the linear trend. You can then adjust for this "untrended" effect of cohort, and assume that the linear trend is only due to period effect (but not cohort effect) you will get period estimates that are adjusted for age and an "untrended" cohort effect.

Proc IML may be a big help for you to make all the linear algebra that is neccessary.

reference: Age-period-cohort models for the Lexis diagram - Reply. / Carstensen, Bendix.

In: Statistics in Medicine, Vol. 27, No. 9, 2008, p. 1561-1564