random syntax in proc mixed

master_jiang · Posted 12-20-2017 10:24 PM

I use SAS Studio. I'm confused about the random statement in proc mixed code. There is a subject option in random statement as following: SUBJECT=effect SUB=effect identifies the subjects in your mixed model. Complete independence is assumed across subjects; thus, for the RANDOM statement, the SUBJECT= option produces a block-diagonal structure in with identical blocks. The matrix is modified to accommodate this block diagonality. In fact, specifying a subject effect is equivalent to nesting all other effects in the RANDOM statement within the subject effect. Continuous variables are permitted as arguments to the SUBJECT= option. PROC MIXED does not sort by the values of the continuous variable; rather, it considers the data to be from a new subject or group whenever the value of the continuous variable changes from the previous observation. Using a continuous variable decreases execution time for models with a large number of subjects or groups and also prevents the production of a large "Class Level Information" table. When you specify the SUBJECT= option and a classification random effect, computations are usually much quicker if the levels of the random effect are duplicated within each level of the SUBJECT= effect. Here is a example code. proc mixed data= test; by param; class trt subjid; model y=trt /ddfm=kr; random subjid(sequence)/subject=subjid(sequence); run; Suggest that our dataset's name is test and there are four sequences. Each sequence contains five subjects. There are 20 subjects in total. Now we want to set a mixed model with treatments as fixed effects and subjects within sequence as random effects. What is the difference between these two statements? 1. random subjid(sequence) 2.random subjid(sequence)/subject=subjid(sequence) Someone told me that the second statement will let SAS run faster but why? Any help would be appreciated.

StatsMan · Posted 01-03-2018 12:11 PM

The two statements

random subject(sequence);

and

random int / subject=subject(sequence);

are equivalent in terms of the model they fit. You will get the same results with either syntax, modelling a common covariance to all the observations from the same level of SUBJECT(SEQUENCE).

The second RANDOM statement is more efficient, however. That statement allows you to process your data by subjects, rather than processing the entire V matrix for the data all at once. If you check the DIMENSIONS table near the top of the PROC MIXED output, you will see an entry for number of subjects. For the first RANDOM statement above, you will see a 1 for the number of subjects since the SUBJECT= option was not used. That 1 indicates that MIXED is processing the entire V matrix at once. The entry for the number of subjects in the DIMENSIONS table for the second RANDOM statement will be equal to the number of unique values of SUBJECT(SEQUENCE) in your data.

Processing the data by subjects will save you memory and will save you execution time. With a small data set, the savings may be minimal. It may take a larger data set and model to see measurable savings.

random syntax in proc mixed

Re: random syntax in proc mixed

Ready to join fellow brilliant minds for the SAS Hackathon?