BookmarkSubscribeRSS Feed
mjkop56
Obsidian | Level 7

Hi, I posted previously (https://communities.sas.com/t5/Statistical-Procedures/longitudinal-multilevel-analysis-multiple-reco...) and got some excellent advice about modeling a longitudinal data set.

 

Summary: Dependent variable = gender. Independent variables = year and person’s position_department (e.g. Senior – Marketing, Principal-Marketing). The interest is in changes in gender over time.

 

Received advice to: run a model of year, position_department, and their interaction which worked nicely.

 

Now, I want to show predicted probabilities for: 1) ALL positions_departments combined (1 plot of gender over time), and 2) by department (vs. by position_department).

 

Problem: there are duplicate people in a given year because the same person can be associated with more than one position_department. For example, the person below shows up in the data 4 times in a given year (therefore, their gender would be counted 4 times):

 

person gender year Position-department department position
1 M 2017 Senior-Marketing Marketing Senior
1 M 2017 Principal-Marketing Marketing Principal
1 M 2017 Senior-Finance Finance Senior
1 M 2017 Principal-Finance Finance Principal

 

 

 

This structure worked for my first question (analysis by position-department), however, if I just look at gender and year for all position-departments this person would be counted 4 times. Seems to be 2 options:

 

1) drop the duplicates (which would change the data and total N) and run 2 extra models. 1 extra model for all positions-departments combined after dropping duplicates (so data would have 1 row per person-year); and 1 extra model for analysis by department (data would have 1 row per person, year and department)

 

2) use the model previously estimated of year, position_department, and their interaction and output the predicted probabilities at department level and for all position-departments combined, which would give slightly different results since duplicates have not been dropped.

 

Best option? Any thoughts would be much appreciated. thank you!

2 REPLIES 2
SteveDenham
Jade | Level 19

I like most of option 2.  I would consider fitting only the interaction term (with a NOINT option) and then using LSMESTIMATE statements (with an ILINK option) to get at the questions of interest.  Both option 2 and this method end up calculating marginal estimates for these effects - the proportions if all cells had an equal number of observations.  If you wish to get at the differences on the probability scale, then you will need the %NLmeans macro (and you should read the SAS Note(s) on this and everything @StatDave  has posted on this forum/topic.

 

SteveDenham

mjkop56
Obsidian | Level 7

Thank you!!! Just so I understand, could you please explain why you would only include an interaction term and not the main effects?

 

Also, I was planning to plot observed data v. the model's predicted probabilities. For the observed data on gender by year for everyone combined, I had planned to drop the duplicates in a given year, as the same person's gender would be counted multiple times. But if I don't drop duplicates before running the model, is that an issue when making comparisons between the observed data v. predicted?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 287 views
  • 0 likes
  • 2 in conversation