BookmarkSubscribeRSS Feed
master_jiang
Calcite | Level 5
I use SAS Studio. I'm confused about the random statement in proc mixed code. There is a subject option in random statement as following: SUBJECT=effect SUB=effect identifies the subjects in your mixed model. Complete independence is assumed across subjects; thus, for the RANDOM statement, the SUBJECT= option produces a block-diagonal structure in with identical blocks. The matrix is modified to accommodate this block diagonality. In fact, specifying a subject effect is equivalent to nesting all other effects in the RANDOM statement within the subject effect. Continuous variables are permitted as arguments to the SUBJECT= option. PROC MIXED does not sort by the values of the continuous variable; rather, it considers the data to be from a new subject or group whenever the value of the continuous variable changes from the previous observation. Using a continuous variable decreases execution time for models with a large number of subjects or groups and also prevents the production of a large "Class Level Information" table. When you specify the SUBJECT= option and a classification random effect, computations are usually much quicker if the levels of the random effect are duplicated within each level of the SUBJECT= effect. Here is a example code. proc mixed data= test; by param; class trt subjid; model y=trt /ddfm=kr; random subjid(sequence)/subject=subjid(sequence); run; Suggest that our dataset's name is test and there are four sequences. Each sequence contains five subjects. There are 20 subjects in total. Now we want to set a mixed model with treatments as fixed effects and subjects within sequence as random effects. What is the difference between these two statements? 1. random subjid(sequence) 2.random subjid(sequence)/subject=subjid(sequence) Someone told me that the second statement will let SAS run faster but why? Any help would be appreciated.
1 REPLY 1
StatsMan
SAS Super FREQ

The two statements

 

   random subject(sequence);

 

and

 

   random int / subject=subject(sequence);

 

are equivalent in terms of the model they fit.  You will get the same results with either syntax, modelling a common covariance to all the observations from the same level of SUBJECT(SEQUENCE).  

 

The second RANDOM statement is more efficient, however.  That statement allows you to process your data by subjects, rather than processing the entire V matrix for the data all at once.  If you check the DIMENSIONS table near the top of the PROC MIXED output, you will see an entry for number of subjects.  For the first RANDOM statement above, you will see a 1 for the number of subjects since the SUBJECT= option was not used.  That 1 indicates that MIXED is processing the entire V matrix at once.  The entry for the number of subjects in the DIMENSIONS table for the second RANDOM statement will be equal to the number of unique values of SUBJECT(SEQUENCE) in your data.  

 

Processing the data by subjects will save you memory and will save you execution time.  With a small data set, the savings may be minimal.  It may take a larger data set and model to see measurable savings.

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 2105 views
  • 0 likes
  • 2 in conversation