Solved: Re: Can I do a GLM Binomial regression on pseudo observations (repeate...

Laastat · Posted 05-10-2024 01:04 PM

I'm working on an actuarial project to estimate monthly probabilities that someone becomes disabled.

A portfolio of persons (each having a different 'PolicyNr') is observed during 12 months, and the time until disability is registered by the variable 'TimetoDisability'. When no disability occured during the 12 months, the variable 'RightCensored' has the value 1. We further have the variables 'Gender', 'AgeatDisability' (which equals the age at the disability, or the age after 12 months for the right censored observations), and the variable 'OccupationClass'.

We have data available in a wide format. For example, the following lines are part of the data:

PolicyNr	TimetoDisability	RightCensored	Gender	AgeatDisability	OccupationClass
001	2 months	0	Male	40 year	1
002	3 months	0	Male	30 year	2
003	12 months	1	Female	42 year	1

Intuitively, I would model this using a Cox proportional hazard model, with the variable 'TimetoDisabilty' as the time until the occurrence of the disability, and 'Gender', 'AgeatDisability', and 'OccupationClass' as covariates. Monthly probabilities are derived from the survival function.

Now assume -because of practical/technical reasons- it is only possible to perform a GLM Binomial regression. I read that performing a GLM Binomial regression on data with pseudo observations is analogue to a Cox Discrete Time Survival model. To prepare the analysis, I transform the wide dataset to a long dataset (with pseudo observations, see, e.g., https://grodri.github.io/glms/notes/c7s6), in which each line is duplicated according the variable TimetoDisability. For example, the first line from the table above is transformed to 2 lines as is took 2 months to become disabled. The last line has a value 1 for the variable 'Disability', as the disability occured in month 2. The variable 'AgeatDisability' is transformed into the variable 'Age', now representing the age during that month. The right censored observation is transformed into 12 lines, all having the value zero for the variable 'Disability', as the disability is not observed. This becomes:

PolicyNr	Duration	Disability	Gender	Age	OccupationClass
001	1	0	Male	39 year 11 months	1
001	2	1	Male	40 year	1
002	1	0	Male	29 year 10 months	2
002	2	0	Male	29 year 11 months	2
002	3	1	Male	30 year	2
003	1	0	Female	41 year 1 months	1
003	2	0	Female	41 year 2 months	1
003	3	0	Female	41 year 3 months	1
003	4	0	Female	41 year 4 months	1
003	5	0	Female	41 year 5 months	1
003	6	0	Female	41 year 6 months	1
003	7	0	Female	41 year 7 months	1
003	8	0	Female	41 year 8 months	1
003	9	0	Female	41 year 9 months	1
003	10	0	Female	41 year 10 months	1
003	11	0	Female	41 year 11 months	1
003	12	0	Female	42 year	1

Question:

In this long data format, the multiple rows (pseudo observations) for each person are not independent. We have repeated measures for each person.

However, I read in Therneau and Grambsch: (quote)

"One concern that often arises is that observations [on the same individual] are "correlated," and would thus not be handled by standard methods. This is not actually an issue. The internal computations for a Cox model have a term for each unique death or event time..."

So for a Cox Discrete Time Survival model, the dependency is not an issue.

However, I don't see how the dependency in the data is not an issue for a GLM Binomial regression?

Is it -given the dependency in the data- appropriate to perform a GLM to get trustworthy estimates of monthly probabilities? Or should I go for a mixed effect model?

Thank you.

sbxkoenk · Posted 05-11-2024 07:12 AM

Due to lack of time, I have just done a quick diagonal read of your post.

See here. Might be useful.

It's about discrete-time logistic hazard regression (aka survival data mining) -- to be done with PROC LOGISTIC :

Predictive Modeling Using Survival Analysis (NESUG18 conference)
Vadim Pliner, Verizon Wireless, Orangeburg, NY
https://www.lexjansen.com/nesug/nesug05/pos/pos6.pdf
Allison P.D. (1982), “Discrete-Time Methods for the Analysis of Event Histories,” in Sociological Methodology 1982, Jossey-Bass.
Predicting Customer Value (SUGI 30 conference 2005 -- Paper 073-30)
Will Potts, Data Miners, Inc.
https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/073-30.pdf
It’s About Time: Discrete Time Survival Analysis Using SAS® Enterprise Miner™ (SAS Global Forum 2012 -- Paper 132-2012)
Sascha Schubert, SAS Institute Inc., Heidelberg, Germany ; Susan Haller and Taiyeong Lee, SAS Institute Inc., Cary, NC
https://support.sas.com/resources/papers/proceedings12/132-2012.pdf
Art or science? Choosing the right regression model
by UDO SGLAVO on OCTOBER 11, 2021
https://blogs.sas.com/content/subconsciousmusings/2021/10/11/art-or-science-choosing-the-right-regre...

Good luck with your modelling efforts,

Koen

View solution in original post

sbxkoenk · Posted 05-11-2024 07:12 AM

Due to lack of time, I have just done a quick diagonal read of your post.

See here. Might be useful.

It's about discrete-time logistic hazard regression (aka survival data mining) -- to be done with PROC LOGISTIC :

Predictive Modeling Using Survival Analysis (NESUG18 conference)
Vadim Pliner, Verizon Wireless, Orangeburg, NY
https://www.lexjansen.com/nesug/nesug05/pos/pos6.pdf
Allison P.D. (1982), “Discrete-Time Methods for the Analysis of Event Histories,” in Sociological Methodology 1982, Jossey-Bass.
Predicting Customer Value (SUGI 30 conference 2005 -- Paper 073-30)
Will Potts, Data Miners, Inc.
https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/073-30.pdf
It’s About Time: Discrete Time Survival Analysis Using SAS® Enterprise Miner™ (SAS Global Forum 2012 -- Paper 132-2012)
Sascha Schubert, SAS Institute Inc., Heidelberg, Germany ; Susan Haller and Taiyeong Lee, SAS Institute Inc., Cary, NC
https://support.sas.com/resources/papers/proceedings12/132-2012.pdf
Art or science? Choosing the right regression model
by UDO SGLAVO on OCTOBER 11, 2021
https://blogs.sas.com/content/subconsciousmusings/2021/10/11/art-or-science-choosing-the-right-regre...

Good luck with your modelling efforts,

Koen

Can I do a GLM Binomial regression on pseudo observations (repeated measures with dependency)?

Re: Can I do a GLM Binomial regression on pseudo observations (repeated measures with dependency)?

Re: Can I do a GLM Binomial regression on pseudo observations (repeated measures with dependency)?