longitudinal data - mutiple people per person/year

mjkop56 · Posted 11-14-2021 12:16 PM

Hi, I posted previously (https://communities.sas.com/t5/Statistical-Procedures/longitudinal-multilevel-analysis-multiple-reco...) and got some excellent advice about modeling a longitudinal data set.

Summary: Dependent variable = gender. Independent variables = year and person’s position_department (e.g. Senior – Marketing, Principal-Marketing). The interest is in changes in gender over time.

Received advice to: run a model of year, position_department, and their interaction which worked nicely.

Now, I want to show predicted probabilities for: 1) ALL positions_departments combined (1 plot of gender over time), and 2) by department (vs. by position_department).

Problem: there are duplicate people in a given year because the same person can be associated with more than one position_department. For example, the person below shows up in the data 4 times in a given year (therefore, their gender would be counted 4 times):

person	gender	year	Position-department	department	position
1	M	2017	Senior-Marketing	Marketing	Senior
1	M	2017	Principal-Marketing	Marketing	Principal
1	M	2017	Senior-Finance	Finance	Senior
1	M	2017	Principal-Finance	Finance	Principal

This structure worked for my first question (analysis by position-department), however, if I just look at gender and year for all position-departments this person would be counted 4 times. Seems to be 2 options:

1) drop the duplicates (which would change the data and total N) and run 2 extra models. 1 extra model for all positions-departments combined after dropping duplicates (so data would have 1 row per person-year); and 1 extra model for analysis by department (data would have 1 row per person, year and department)

2) use the model previously estimated of year, position_department, and their interaction and output the predicted probabilities at department level and for all position-departments combined, which would give slightly different results since duplicates have not been dropped.

Best option? Any thoughts would be much appreciated. thank you!

SteveDenham · Posted 11-16-2021 09:59 AM

I like most of option 2. I would consider fitting only the interaction term (with a NOINT option) and then using LSMESTIMATE statements (with an ILINK option) to get at the questions of interest. Both option 2 and this method end up calculating marginal estimates for these effects - the proportions if all cells had an equal number of observations. If you wish to get at the differences on the probability scale, then you will need the %NLmeans macro (and you should read the SAS Note(s) on this and everything @StatDave has posted on this forum/topic.

SteveDenham

mjkop56 · Posted 11-17-2021 09:20 AM

Thank you!!! Just so I understand, could you please explain why you would only include an interaction term and not the main effects?

Also, I was planning to plot observed data v. the model's predicted probabilities. For the observed data on gender by year for everyone combined, I had planned to drop the duplicates in a given year, as the same person's gender would be counted multiple times. But if I don't drop duplicates before running the model, is that an issue when making comparisons between the observed data v. predicted?

longitudinal data - mutiple people per person/year

Re: longitudinal data - mutiple people per person/year

Re: longitudinal data - mutiple people per person/year

Catch up on SAS Innovate 2026