BookmarkSubscribeRSS Feed
khollid
Fluorite | Level 6

Hi.  I am having trouble identifying the best SAS PROC for my analytic question.  I would ike to calculate an ICC for data that have a nominal (i.e. categorical but not ordinal) outcome.  The data is hierarchical, falling into four levels, as well as repeated over time.  I have not been able to find something that can do all of these things. Glimmix seems to handle nominal hierarchical data, but I have not found a way to add in the repeated part.  Is this possible?  Should I use a different proc?  My code thus far (although it is not all correct) is below.

 

The data structure is "round" nested within "day" nested within participant ("ID") nested within "state".  The outcome is the nominal variable "main" and it is repeated over "minute" (1-1440 for each day).  The below code does not work with the last random statement included (returns error that Glimmix cannot have r-side random effects with multinomial distribution).  I am also not confident that the overall set up is correct.  I would like to calculate the ICC for the variation due to day.  This is the only parameter I am interested in.  Any suggestions are appreciated.

 

PROC GLIMMIX DATA=overall METHOD=RSPL;
CLASS state ID day round minute main;
MODEL main (order=freq ref=first)= / DIST=multinomial LINK=GLOGIT;
RANDOM INTERCEPT / SUBJECT=state group=main TYPE=VC ;
RANDOM INTERCEPT / SUBJECT=ID (state) group=main TYPE=VC ;
RANDOM INTERCEPT / SUBJECT=day (ID state) group=main TYPE=VC ;
RANDOM INTERCEPT / SUBJECT=round (day ID state) group=main TYPE=VC ;
RANDOM minute/ SUBJECT=round(day ID state) group=main2 type=VC residual;
5 REPLIES 5
ChrisHemedinger
Community Manager
Note: I moved this topic to Statistical Procedures. You'll probably get a faster answer here.
It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
SteveDenham
Jade | Level 19
PROC GLIMMIX DATA=overall METHOD=Laplace;
CLASS state ID day round minute main;
MODEL main (order=freq ref=first)= / DIST=multinomial LINK=GLOGIT;
RANDOM INTERCEPT / SUBJECT=state group=main TYPE=VC ;
RANDOM INTERCEPT / SUBJECT=ID (state) group=main TYPE=VC ;
RANDOM INTERCEPT / SUBJECT=day (ID state) group=main TYPE=VC ;
RANDOM INTERCEPT / SUBJECT=round (day ID state) group=main TYPE=VC ;
RANDOM minute/ SUBJECT=round(day ID state) group=main type=ar(1) ;

Try the above.  I switched the method to Laplace, and the final RANDOM to a G side repeated measure.

 

However, I suspect this will run into memory errors, if you have 1440 observations per round, multiple rounds per day, days per ID and IDs per state.  Good luck.

 

If that is the case you may need to look at some sort of spline mechanism as a fixed effect, and thus reduce the size of the Z part of the matrix.

 

Steve Denham

khollid
Fluorite | Level 6

Thanks for your response.  I'm glad to know it is at least technically possible and how to do it.  I think I can change the dataset to have the minute not be 1-1440.  I don't have outcomes for every minute of the day, so I will pare it down to the ones that do (and time of day isn't of concern for the analysis).  I will also try to see if I can run it on the cluster in the next couple of days to see if that helps with memory (I'll have to sign up for access first).  I'll check back in and let you know how it goes once I've tried all my options.  Thanks again!

khollid
Fluorite | Level 6

I ended up changing the data structure so that I could estimate this aim with minutes of activity as the outcome instead of having a nominal outcome, and I was able to get those models to run.  However, I'm now working on another aim and I'm back to needing to work with the nominal outcome again.  I have tried a lot of things but can't get the model to work.  In this case, I again have repeated observations of physical activity locations for participants.  Each observation has an outcome of a categorical location of physical activity.  I am wanting to test for differences in activity locations by various factors (e.g. gender) accounting in some way for the correlation between repeated observations for individuals.  At this point we are ignoring all of the other nesting, we're just trying to get this basic control for repeated measures across ID included in the model.  My code thus far is:

 

proc glimmix data=mvpa method=laplace;

class ID maincat;

model maincat (REF="0")=gender /dist=MULT link=glogit solution cl ;

RANDOM intercept / SUBJECT=ID type=vc group=maincat;

covtest/wald;

title "Main*GLogisticTest";

run;

 

Currently maincat is coded 0-9.  I previously had it as a character variable (e.g. "home" "road") but was told in a SAS help session at school that I needed to change it to 0-9.  Is that correct?  If so am I supposed to sort by maincat before running the model?  I'm also unsure about whether or not maincat should be in the class statement and about method=laplace.  I thought I needed method=laplace, but the help session suggested not specifying the method=.  I keep getting issues of either not having a valid objective function or insufficient memory.  I tried it on the cluster and SAS returned an error of "Model is too large to be fit by PROC GLIMMIX in a reasonable amount of time on this system.  Consider changing your model."   I did try running binary models to get starting cov parms but inputting those parms received an error that the parms weren't feasible.  There are about 220 IDs, 128,000 observations (max obs per subject 2800).  Any suggestions? 

SteveDenham
Jade | Level 19

For a multinomial with a generalized logit link, you don't have to recode purely character strings to numeric values, but it helps when dealing with very large datasets.

 

For your analysis, perhaps consider an error structure that is common across all main categories, at least to get started.  What occurs if you drop the group=maincat option from your RANDOM statement?  Also, with as much data as you have, consider using the pseudolikelihood method, rather than the Laplace method.  Just drop method=Laplace from the PROC GLIMMIX statement.

 

If neither of these give reasonable answers in a reasonable time, consider subsampling your data, and then jackknifing the results.

 

Steve Denham

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2752 views
  • 0 likes
  • 3 in conversation