Hi, I posted previously (https://communities.sas.com/t5/Statistical-Procedures/longitudinal-multilevel-analysis-multiple-reco...) and got some excellent advice about modeling a longitudinal data set.
Summary: Dependent variable = gender. Independent variables = year and person’s position_department (e.g. Senior – Marketing, Principal-Marketing). The interest is in changes in gender over time.
Received advice to: run a model of year, position_department, and their interaction which worked nicely.
Now, I want to show predicted probabilities for: 1) ALL positions_departments combined (1 plot of gender over time), and 2) by department (vs. by position_department).
Problem: there are duplicate people in a given year because the same person can be associated with more than one position_department. For example, the person below shows up in the data 4 times in a given year (therefore, their gender would be counted 4 times):
person | gender | year | Position-department | department | position |
1 | M | 2017 | Senior-Marketing | Marketing | Senior |
1 | M | 2017 | Principal-Marketing | Marketing | Principal |
1 | M | 2017 | Senior-Finance | Finance | Senior |
1 | M | 2017 | Principal-Finance | Finance | Principal |
This structure worked for my first question (analysis by position-department), however, if I just look at gender and year for all position-departments this person would be counted 4 times. Seems to be 2 options:
1) drop the duplicates (which would change the data and total N) and run 2 extra models. 1 extra model for all positions-departments combined after dropping duplicates (so data would have 1 row per person-year); and 1 extra model for analysis by department (data would have 1 row per person, year and department)
2) use the model previously estimated of year, position_department, and their interaction and output the predicted probabilities at department level and for all position-departments combined, which would give slightly different results since duplicates have not been dropped.
Best option? Any thoughts would be much appreciated. thank you!
I like most of option 2. I would consider fitting only the interaction term (with a NOINT option) and then using LSMESTIMATE statements (with an ILINK option) to get at the questions of interest. Both option 2 and this method end up calculating marginal estimates for these effects - the proportions if all cells had an equal number of observations. If you wish to get at the differences on the probability scale, then you will need the %NLmeans macro (and you should read the SAS Note(s) on this and everything @StatDave has posted on this forum/topic.
SteveDenham
Thank you!!! Just so I understand, could you please explain why you would only include an interaction term and not the main effects?
Also, I was planning to plot observed data v. the model's predicted probabilities. For the observed data on gender by year for everyone combined, I had planned to drop the duplicates in a given year, as the same person's gender would be counted multiple times. But if I don't drop duplicates before running the model, is that an issue when making comparisons between the observed data v. predicted?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.