BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mjkop56
Obsidian | Level 7

I’m running an analysis looking at trends in gender over time. Dependent variable is gender (Male, Female, Unknown), and there is 1 explanatory variable for year. I’ve run a multinomial model on a subset of my data (those with position = H and department = X) and produced predicted probabilities (example: https://stats.idre.ucla.edu/sas/dae/multinomiallogistic-regression/)

 

My goal: to produce the same model+predicted probabilities for every position and department in the data (there are maybe 10 total position-department combinations).

At first, I thought I would do the analysis on full data and add variables for position, department and maybe some interactions.

Note: here lies the problem.

In the data I have, the same person can correspond to multiple departments and positions in the same year. I do not have any finer time indicator than year.  So for example:

 

person year position department gender
12009HAM
12009HBM
12009LCM
12009LBM

 

In producing the chart for a given department and position, I want to include everyone who was in that department and position in a given year, regardless if they were also included in other departments and positions in the same year.

 

Therefore, I'm thinking I just need to run the analysis for each combination of position department. E.G. keep the records that are position H and department A, and run model/produce chart. then, go back to original file, keep the records that are position H and department B, and run model/produce chart. etc.  Does this sound OK? Or maybe I need to consider some kind of random effects model with something like position and department nested in patient-time?

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

Something using GEE sounds best to me.  The observations within year for each subject should be considered exchangeable.  This code might be a starting point.

 

proc genmod data=yourdata
   class person year position_dept;
   model gender = year|position_dept /dist=binary;
   repeated sub=person;
lsmeans year|position_dept/ilink;
   assess var=(year) / resample
                       seed=603708000;
run;

Most of the code is adapted from the GENMOD example Assessment of a Marginal Model for Dependent Data.  If you want differences, I strongly recommend the use of the %NLmeans macro.  Search this forum for posts from @StatDave to learn more about this.

 

SteveDenham

View solution in original post

4 REPLIES 4
SteveDenham
Jade | Level 19

Is the data sequential within year?  If not, you are looking at a hierarchical, but unbalanced, design.  One thing would be to create an "artificial" variable that combines position and department.  Run the analysis with time, artificial variable and the interaction.  If there are non-estimable lsmeans, then drop back to fitting only the interaction.  To get the comparisons you want, I would recommend using LSMESTIMATE statements.

 

SteveDenham

mjkop56
Obsidian | Level 7

Thank you!! this is extremely helpful!! The data is likely sequential within year, but there is no finer unit of time than year so one cannot tell the order of the observations within year, and interest is at the year level. Example of data structure now:

 

personyearposition+departmentgender (outcome)
12009H, AM
12009H, BM
12009L, CM
12009L, BM
12010H, AM
22008L, CF
22009L, AF
32010H, AUnknown
32010H, BUnknown
32011L, BUnknown

 

There are no duplicate records across person, year, and department+position. There can be the same year within the same person, and the same person can be found in multiple years.

I was thinking of doing clustered standard errors to allow for repeated observations within individuals. Would that work or does one need to do a multilevel with a random intercept for the person for example? I’m not used to dealing with cases where the same year is repeated within an individual.

SteveDenham
Jade | Level 19

Something using GEE sounds best to me.  The observations within year for each subject should be considered exchangeable.  This code might be a starting point.

 

proc genmod data=yourdata
   class person year position_dept;
   model gender = year|position_dept /dist=binary;
   repeated sub=person;
lsmeans year|position_dept/ilink;
   assess var=(year) / resample
                       seed=603708000;
run;

Most of the code is adapted from the GENMOD example Assessment of a Marginal Model for Dependent Data.  If you want differences, I strongly recommend the use of the %NLmeans macro.  Search this forum for posts from @StatDave to learn more about this.

 

SteveDenham

mjkop56
Obsidian | Level 7

excellent. thank you so much!!!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 878 views
  • 2 likes
  • 2 in conversation