Solved: Re: longitudinal/multilevel analysis - multiple records per person-yea...

mjkop56 · Posted 10-04-2021 07:45 PM

I’m running an analysis looking at trends in gender over time. Dependent variable is gender (Male, Female, Unknown), and there is 1 explanatory variable for year. I’ve run a multinomial model on a subset of my data (those with position = H and department = X) and produced predicted probabilities (example: https://stats.idre.ucla.edu/sas/dae/multinomiallogistic-regression/)

My goal: to produce the same model+predicted probabilities for every position and department in the data (there are maybe 10 total position-department combinations).

At first, I thought I would do the analysis on full data and add variables for position, department and maybe some interactions.

Note: here lies the problem.

In the data I have, the same person can correspond to multiple departments and positions in the same year. I do not have any finer time indicator than year. So for example:

person year position department gender

1	2009	H	A	M
1	2009	H	B	M
1	2009	L	C	M
1	2009	L	B	M

In producing the chart for a given department and position, I want to include everyone who was in that department and position in a given year, regardless if they were also included in other departments and positions in the same year.

Therefore, I'm thinking I just need to run the analysis for each combination of position department. E.G. keep the records that are position H and department A, and run model/produce chart. then, go back to original file, keep the records that are position H and department B, and run model/produce chart. etc. Does this sound OK? Or maybe I need to consider some kind of random effects model with something like position and department nested in patient-time?

SteveDenham · Posted 10-07-2021 08:41 AM

Something using GEE sounds best to me. The observations within year for each subject should be considered exchangeable. This code might be a starting point.

proc genmod data=yourdata
   class person year position_dept;
   model gender = year|position_dept /dist=binary;
   repeated sub=person;
lsmeans year|position_dept/ilink;

   assess var=(year) / resample
                       seed=603708000;
run;

Most of the code is adapted from the GENMOD example Assessment of a Marginal Model for Dependent Data. If you want differences, I strongly recommend the use of the %NLmeans macro. Search this forum for posts from @StatDave to learn more about this.

SteveDenham

View solution in original post

SteveDenham · Posted 10-05-2021 11:07 AM

Is the data sequential within year? If not, you are looking at a hierarchical, but unbalanced, design. One thing would be to create an "artificial" variable that combines position and department. Run the analysis with time, artificial variable and the interaction. If there are non-estimable lsmeans, then drop back to fitting only the interaction. To get the comparisons you want, I would recommend using LSMESTIMATE statements.

SteveDenham

mjkop56 · Posted 10-06-2021 06:43 PM

Thank you!! this is extremely helpful!! The data is likely sequential within year, but there is no finer unit of time than year so one cannot tell the order of the observations within year, and interest is at the year level. Example of data structure now:

person	year	position+department	gender (outcome)
1	2009	H, A	M
1	2009	H, B	M
1	2009	L, C	M
1	2009	L, B	M
1	2010	H, A	M
2	2008	L, C	F
2	2009	L, A	F
3	2010	H, A	Unknown
3	2010	H, B	Unknown
3	2011	L, B	Unknown

There are no duplicate records across person, year, and department+position. There can be the same year within the same person, and the same person can be found in multiple years.

I was thinking of doing clustered standard errors to allow for repeated observations within individuals. Would that work or does one need to do a multilevel with a random intercept for the person for example? I’m not used to dealing with cases where the same year is repeated within an individual.

SteveDenham · Posted 10-07-2021 08:41 AM

Something using GEE sounds best to me. The observations within year for each subject should be considered exchangeable. This code might be a starting point.

proc genmod data=yourdata
   class person year position_dept;
   model gender = year|position_dept /dist=binary;
   repeated sub=person;
lsmeans year|position_dept/ilink;

   assess var=(year) / resample
                       seed=603708000;
run;

Most of the code is adapted from the GENMOD example Assessment of a Marginal Model for Dependent Data. If you want differences, I strongly recommend the use of the %NLmeans macro. Search this forum for posts from @StatDave to learn more about this.

SteveDenham

mjkop56 · Posted 10-07-2021 08:49 AM

excellent. thank you so much!!!

longitudinal/multilevel analysis - multiple records per person-year

Re: longitudinal/multilevel analysis - multiple records per person-year

Re: longitudinal/multilevel analysis - multiple records per person-year

Re: longitudinal/multilevel analysis - multiple records per person-year

Re: longitudinal/multilevel analysis - multiple records per person-year

Re: longitudinal/multilevel analysis - multiple records per person-year

longitudinal/multilevel analysis - multiple records per person-year

Re: longitudinal/multilevel analysis - multiple records per person-year

Re: longitudinal/multilevel analysis - multiple records per person-year

Re: longitudinal/multilevel analysis - multiple records per person-year

Re: longitudinal/multilevel analysis - multiple records per person-year

Re: longitudinal/multilevel analysis - multiple records per person-year

SAS Innovate 2025: Call for Content