BookmarkSubscribeRSS Feed
CalebN
Calcite | Level 5

Hello,

I am using SAS 9.4 and trying to include covariates in my analysis. See my code below. 

My response variable is NO3, my treatment is rot, my subplot is till, and there are repeated measures. I have two covariates I would like to add to the model, GDD and date. When I add one of these covaraites to the model statement, I get the following error.

"ERROR: LINES display is not produced for the year effect because p-values are not available for all least-squares means differences."

 

proc glimmix;
class rot main till year rep;
model NO3=rot|till|year date/ddfm=kr;
random int Main(rot)/subject=rep;
Random year/residual subject=main(rep*rot*till) type=cs;
lsmeans rot|till year/pdiff=all lines;
run;

What are some potential causes of this error? Am I entering the covariates in the model correctly?

Thanks for your help!

 

10 REPLIES 10
PaigeMiller
Diamond | Level 26

@CalebN wrote:

Hello,

I am using SAS 9.4 and trying to include covariates in my analysis. See my code below. 

My response variable is NO3, my treatment is rot, my subplot is till, and there are repeated measures. I have two covariates I would like to add to the model, GDD and date. When I add one of these covaraites to the model statement, I get the following error.

"ERROR: LINES display is not produced for the year effect because p-values are not available for all least-squares means differences."

 

proc glimmix;
class rot main till year rep;
model NO3=rot|till|year date/ddfm=kr;
random int Main(rot)/subject=rep;
Random year/residual subject=main(rep*rot*till) type=cs;
lsmeans rot|till year/pdiff=all lines;
run;

What are some potential causes of this error? Am I entering the covariates in the model correctly?

Thanks for your help!

 


You did not show us how you are entering the covariates in the model. Show us that code. Click on the running man icon and paste your code into that window.

 

The problem with LSMEANS is that it often fails if you don't have in your data all possible combinations of ROT and TILL and your covariates if they are in the LSMEANS statement. Also, show us a PROC FREQ or other analysis of ROT and TILL and covariates so that we can see there is actually data for all possible combinations of ROT and TILL and covariates (do NOT simply tell us that you have all possible combinations of ROT and TILL and covariates).

--
Paige Miller
CalebN
Calcite | Level 5

Hello Paige,

I am just putting the covaraites in the model statement. I'm not sure if that's correct or not though. I had bolded date which is all I have done to include that covariate. When date or GDD is not included in this code, I get all my lsmeans correctly. I included the output of  Proc Freq for ROT, TILL, GDD, and DATE. DATE has some which have higher frequency because the date was the same for two years.

Thanks again!

image.pngimage.pngimage.png

proc glimmix;
*output out=second;
class rot main till year rep;
model NO3=rot|till|year date/ddfm=kr;
random int Main(rot)/subject=rep;
Random year/residual subject=main(rep*rot*till) type=cs;
lsmeans rot|till year/pdiff=all lines;
run;

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

If your design is a "standard" two-way factorial in a split-plot design, with whole plots in blocks and with repeated measures, then your use of main in your code looks odd to me. Your problem with including a covariate may be a problem with your basic model specification, and not necessarily with the covariate addition. (Although I'm also a bit suspicious about the role of date when year is in the model.)

 

Would 

 

proc glimmix data=<something>;
  class rot till year rep;
  model NO3=rot|till|year / ddfm=kr;
  random int rot / subject=rep;
  random year / residual subject=rep*rot*till type=<something>;
run;

be more correct for your basic model? (Note that rep*rot -- as implemented in the first random statement -- is an alternative way to identify a specific whole plot, which you might be calling main. Knowing which rep and which level of rot, one can uniquely identify which whole plot.)

 

I hope this helps.

 

Edit: How a covariate is incorporated in the statistical model depends on the experimental/sampling unit on which the covariate is measured. The covariate could be measured at the whole plot level, or at the subplot level, etc. You're more likely to get better input from the Community if you provide more information about your experimental design. For example, I'm guessing that date (and gdd) are repeated measures made within each year; if so, then when you add date, you need to modify the random statements accordingly. You might find Analysis of Messy Data, Volume III: Analysis of Covariance to be useful in sorting this out.

 

Edit 2: Looking at the tables you posted, it appears that one value of date (or gdd) is paired with one value of year. If that is true, then you cannot have both date and year in the same model--date and year are redundant, and the model will be overspecified. You have to pick one or the other.

 

For the future, posting table images is not ideal. Instead, save output to a file, e.g., PDF (not Word) using ODS OUTPUT, and attach the file.

 

 

PaigeMiller
Diamond | Level 26

Maybe I'm not understanding you, or you are not understanding me, but you first talk about putting the covariate GDD into the model, I ask to see the code, and you don't show me the code that has GDD in the model.

 

It's really hard to advise without seeing exactly what you are doing.

--
Paige Miller
CalebN
Calcite | Level 5

Sorry, I think I understand what you were saying now. I would like to use either date, or GDD as my covariate, whichever explains more variation. So the code I've tried is 

proc glimmix;
class rot main till year rep;
model NO3=rot|till|year GDD/ddfm=kr;
random int Main(rot)/subject=rep;
Random year/residual subject=main(rep*rot*till) type=cs;
lsmeans rot|till year/pdiff=all lines;
run;

or 

proc glimmix;
class rot main till year rep;
model NO3=rot|till|year date/ddfm=kr;
random int Main(rot)/subject=rep;
Random year/residual subject=main(rep*rot*till) type=cs;
lsmeans rot|till year/pdiff=all lines;
run;

Here's a bit more detail on my experimental design since I don't think I was specific enough. Soil NO3 samples were taken from each "MAIN" every four years and there are 2 "MAIN" for each ROT and TILL which are offset by 2 years in a 4 year crop rotation (ROT). This means every 2 years there is a measure of NO3 in every ROT and TILL but the repeated measure is only in every MAIN. 

NO3 is in theory supposed to be measured at the same time each year, but is sometimes a bit earlier or later depending on field conditions. DATE is the Julian date of NO3 sampling and GDD is an accumulation of temperature that year up until the sample date. All NO3 values have the same DATE and GDD value for each year. I suspect higher DATE and GDD will result in higher NO3 and would like to account for this using one of these 2 variables. 

Hopefully this is more clear.

Thanks again.

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I appreciate the additional information, but there still is not enough detail to identify a statistical model that is consistent with your experiment. Plus crop rotation studies are complicated regardless.

 

I find that it is extraordinarily useful to distinguish between design components and treatment components in an experimental design. Using your favorite search engine, search on "design structure and treatment structure"; there are several useful hits. You are currently co-mingling the two concepts. For example, you do not take samples from "each ROT and TILL" (of which there appear to be 7 x 2 = 14 combinations); instead you take samples from the experimental units assigned to levels of ROT and TILL, of which there are 14 x however many replications.

 

You may have statistical support available at your institution, perhaps even within the agricultural academic unit. If so, you would likely find it invaluable to work across a table rather than across the internet.

 

I hope this helps.

 

PaigeMiller
Diamond | Level 26

@CalebN wrote:

So the code I've tried is 

lsmeans rot|till year/pdiff=all lines;


So once again, I will state that you need to be sure that you have all possible combinations of rot and till, otherwise you cannot get LSMEANS and you probably can't get the LINES option to work either. You have to demonstrate via PROC FREQ or similar that you have all possible combinations of these two variables. Judging from the screen captures, it is unlikely that you have met this condition, but I can't be sure so that is why I say that YOU need to demonstrate this.

 

You need to run PROC FREQ with the tables command

 

tables roc * till;
--
Paige Miller
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I think the full factorial is incomplete because each Main is not measured in each year, and there are only two reps (Main) for each ROT x TILL combination. So I'd look at

 

proc tabulate data = <>;
  class rot till year;
  table rot*till, year;
  run;

or

 

proc freq data = <>;
  tables roc*till*year;
  run;
CalebN
Calcite | Level 5

Thanks for your help, I have some resources at my institution that might be able to help me further.

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

One more bit of input:

 

Here are some analysis ideas for your study.

1. Define two blocks (rep), one for the main plots first measured in 1994 and a second for main plots first measured in 1995. Block variability will include both spatial and temporal sources. See link to Loughin paper below.

2. Define a new variable (year_order) that takes values 1 through 6 (1 for 1994 and 1995, 2 for 1996 and 1997, etc.) This will prevent problems due to an incomplete factorial.

3. Fit a mixed model with year_order, something like (untested, of course):

 

proc glimmix data=<>;
  class rep rot till year_order;
  model y = rot | till | year_order / ddfm=kr2;
  random intercept rot / subject=rep;
  random year_order / subject=rep*rot*till type=<> residual;
  run;

4.  Replace year_order (which is categorical, hence ANOVA-like) in the statistical model with date (which is continuous, hence regression). As I mentioned previously, you will not have success including both year and date as fixed effects factors in the same model because there is a one-to-one correspondence between these two variables; and besides, the point is to replace year with something more mechanistically informative. As you work through this analysis, you can now consider (1) whether the relationship between y and date is linear; (2) whether and how to center the covariate (you'll want to center the covariate, but you'll have to decide which value to center on; see link to paper below); (3) whether to incorporate random slopes; (4) whether to incorporate autocorrelation among the repeated measurements within each subplot; (5) and probably other stuff that pops up during the process (e.g., distributional assumptions, variance estimation problems). Hopefully you are now a bit apprehensive about the complexity of this process, and that's good. It is complicated, and that's why a good stat consultant is so valuable. This is not a do-it-yourself project for most people who do not have extensive stat experience.

 

You'll find design insight in this paper by Tom Loughin Improved Experimental Design and Analysis for Long-Term Experiments. His setup is not exactly like yours (it has observations on each experimental unit in each year), but you'll find concepts and parallels for your study.

 

For random slopes and centering, particularly centering on values other than the overall mean, this paper is good A simple method for distinguishing within- versus between-subject effects using mixed models

 

I hope your on-campus resources work out for you. Good luck, and enjoy the process!

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2975 views
  • 0 likes
  • 3 in conversation