BookmarkSubscribeRSS Feed
bmarc
Fluorite | Level 6

I am trying to run a repeated measures model in proc mixed with a large data set (~2.5 million observations) using the following code.

 

proc mixed data=IBMmods.ac method=REML;
    class year source month id;
    model DO = source / ddfm=kr solution;
    repeated month / subject=id type=cs;
    random year;
run;

 

The model runs and the output indicates that convergence criteria were met but when I look at the Solutions for Fixed Effects the model has produced parameter estimates, albeit strange ones, but the SE for each estimate is 0 as are the degrees of freedom. t-values and p-values are not produce. I've tried running the model with different covariance structures with the same result. If I omit the random statement, the model runs fine and I get estimates that make sense with their SEs and DFs. I've also tried running the model with a subset of the data (~270,000 observations) with the same results. Any help or insight would be greatly appreciated. I've attached a dummy data set so that you can see the structure of the data I'm working with. Thanks.

9 REPLIES 9
PaigeMiller
Diamond | Level 26

It sounds to me like one of your variables is always missing or constant.

--
Paige Miller
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

You have typos in your example dataset, but I'll presume that's not the case in the actual dataset. (id 8 and 23 are assigned to both source 1 and 2, and I am guessing that each id should be associated with only one source.)

 

In your example dataset, each id is associated with only one source and only one year, and there are four repeated measures on each id (one for each of four months). Consequently, id is nested within year. Your current code specifies that id, year and month are random effects factors, and that source is a fixed effects factor. Because neither year or month are in the MODEL statement, you are assuming that the mean of DO does not vary by year or by month: year and month affect only the variance of DO. Your current code specifies that year and id are crossed random effects factors, but most of the year x id combinations have no data:

 

proc tabulate data=test;
    class id year month source;
    table source*id, year*month;
    run;

 

I suspect that these missing combinations may be the source of your estimation problem, but I am not sure.

 

Assuming that my interpretation of your study design is correct, this is the model I would first consider:

 

proc mixed data=test;
    class source id year month;
    model DO = source;
    random intercept source / subject=year;
    repeated month / subject=id(year source) type=cs;
    run;

 

I definitely would ponder whether year and/or month should be fixed effects factors rather than random effects factors, but your actual data set may have many more years and/or months than is evident in your example data set. In another thread, I made comments on the year random or fixed topic here: https://communities.sas.com/t5/SAS-Statistical-Procedures/How-to-analyze-a-split-plot-study-with-yea...

 

HTH

 

Edited: I change the RANDOM syntax to one that likely works better with big datasets.

 

 

 

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I played around with the code some more, and the missing id x year combinations do not seem to be a issue. So that's not the source of your estimation problem. My apologies for heading off track.

 

bmarc
Fluorite | Level 6
No worries. Thanks for looking into this. I appreciate any suggestions that may lead to appropriate estimates.
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Me, too 🙂

 

There might be clues in the actual output or log, if you would like to post those.

 

Is the large number of observations due to many, many id levels? How many years, and how many months?

 

 

bmarc
Fluorite | Level 6

This is all that's displayed in the log when I run the model. It didn't appear that there was anything idicating what the issue might be.

 

3    proc mixed data=IBMmods.actest method=REML;
NOTE: Writing HTML Body file: sashtml.htm
4        class year source month id;
5        model DO = source / ddfm=kr solution;
6        repeated month / subject=id type=cs;
7        random year;
8    run;

WARNING: Class levels for ID are not printed because of excessive size.
WARNING: ODS graphics with more than 5000 points have been suppressed. Use the PLOTS(MAXPOINTS= ) option in the PROC MIXED
         statement to change or override the cutoff.
NOTE: Convergence criteria met.
NOTE: PROCEDURE MIXED used (Total process time):
      real time           36.96 seconds
      cpu time            36.45 seconds

 

And the model output is attached.

There are many ids in the model spanning 4 months for each of 27 years. Individuals are different for each level of year and source. Could the large number of individuals cause problems when trying to look at the random effect of year?

 

I also just noticed in this output that while it says that there are 731939 IDs in the class level information, there is only one subject in the dimensions category. Additionally the output indicates that all the observations are attributed that one subject. Any thoughts on why this is happening?

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Hmm. 

 

It seems odd that the parameter estimates for intercept and source have 0 SE and 0 df, and yet the overall test of source in the Type III Tests table does not look unusual--except for denom df = 973 which strikes me as much too small. 

Is each ID coded uniquely, as in your example dataset? 

Should you have four months of data for each ID? (731939 IDs time 4 months does not equal 2443672 observations, but no missing values are reported.)

 

I'm beginning to suspect a structural problem with the dataset, perhaps only because I don't have any other ideas.

 

If you haven't already, I'd compute descriptive statistics to follow up on Paige's comment about one of the variables being always missing or constant. 

 

For your model with REPEATED / TYPE=CS, the code below is a different parameterization of the same model (as long as the CS parameter is not negative). I'd try it, and see if I got the same results.

 

proc mixed data=test;
    class source id year month;
    model y = source / ddfm=kr solution;
    random intercept  / subject=year;
    random intercept  / subject=id(year source);
    run;

And there's always SAS Tech Support!

tianlin_wang
SAS Employee

Hi, I support PROC MIXEDE at SAS.  Is it possible for you to email me your log file or post log file here?

tianlin_wang
SAS Employee

Please ignore my request for the log file. I noticed it was already posted it here. Can you try to run it with option kr=residual? If you still have SE=0 issue, then you'd better to email me your complete dataset so that I can replicate the issue and fix the bug.

 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2555 views
  • 0 likes
  • 4 in conversation